Programming For Non Programmers
Programming For Non Programmers
Release 2.6.2
Steven F. Lott
CONTENTS
1 Preface
1.1 Why Read This Book? . . . . .
1.2 What Is This Book About? . . .
1.3 Audience . . . . . . . . . . . . .
1.4 Conventions Used in This Book
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Getting Started
2.1 About Python . . . . . . . . . . . . . . . . . . . . . .
2.2 About Programming . . . . . . . . . . . . . . . . . . .
2.3 Let There Be Python: Downloading and Installing . .
2.4 Two Minimally-Geeky Problems : Examples of Things
2.5 Why Python is So Cool . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
Best
. . .
1
.
.
.
.
.
.
.
.
5
5
5
6
8
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
Done by Customized Software
. . . . . . . . . . . . . . . . . .
.
.
.
.
.
11
11
18
24
30
35
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Using Python
39
3.1 Instant Gratication : The Simplest Possible Conversation . . . . . . . . . . . . . . . . . . . 39
3.2 IDLE Time : Using Tools To Be More Productive . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Arithmetic and Expressions
4.1 Simple Arithmetic : Numbers and Operators
4.2 Better Arithmetic Through Functions . . . .
4.3 Extra Functions: math and random . . . . . .
4.4 Special Ops : Binary Data and Operators . .
4.5 More Advanced Expression Topics . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Programming Essentials
5.1 Seeing Results : The print Statement
5.2 Turning Python Loose With a Script
5.3 Expressions, Constants and Variables
5.4 Assignment Bonus Features . . . . . .
5.5 Can We Get Your Input? . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
95
. 95
. 99
. 102
. 108
. 111
6 Some Self-Control
6.1 Truth and Logic : Boolean Data and Operators . . .
6.2 Making Decisions : The Comparison Operators . . . .
6.3 Advanced Logic Operators . . . . . . . . . . . . . . .
6.4 Processing Only When Necessary : The if Statement
6.5 While We Have More To Do : The for Statement . .
6.6 While We Have More To Do : The while Statement .
6.7 Becoming More Controlling . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55
55
67
74
79
86
117
117
123
127
130
137
144
150
i
6.8
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
201
201
207
227
236
252
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
293
293
309
322
329
.
.
.
.
387
387
402
407
422
433
433
441
445
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16 Appendices
455
16.1 Debugging Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
16.2 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
16.3 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
17 Indices and Tables
479
481
ii
Bibliography
483
485
iii
iv
Part I
Legal Notice
This work is licensed under a Creative Commons License. You are free
to copy, distribute, display, and perform the work under the following conditions:
Attribution. You must give the original author, Steven F. Lott, credit.
Noncommercial. You may not use this work for commercial purposes.
No Derivative Works. You may not alter, transform, or build upon this work.
For any reuse or distribution, you must make clear to others the license terms of this work.
The Walrus and the Carpenter Lewis Carroll
CHAPTER
ONE
PREFACE
1.1 Why Read This Book?
Youll need to read this book when you have the following three things happening at the same time:
You have a problem to solve that involves data and processing.
Youve found that the common desktop tools (word processors, spread sheets, databases, organizers,
graphics) wont really help. Youve found that they require too much manual pointing and clicking, or
they dont do the right kinds of processing on your data.
Youre ready to invest some of your own time to learn how to write customized software that will solve
your problem.
Youll want to read this book if you are tinkerer who likes to know how things really work. For many people,
a computer is just an appliance. You may not nd this satisfactory, and you want to know more. People
who tinker with computers are called hackers, and you are about to join their ranks.
Python is what youve been looking for. It is an easy-to-use tool that can do any kind of processing on any
kind of data. Seriously: any processing, any data. Programming is the term for setting up a computer to
do the processing you dene on your data. Once you learn the Python language, you can solve your data
processing problem.
Our objective is to get you, a non-programming newbie, up and running. When youre done with this book,
youll be ready to move on to a more advanced Python book. For example, a book about the Python
libraries. You can use these libraries can help you build high-quality software with a minimum of work.
and clicking in other software tools. Also, we can create programs that do things that other desktop tools
cant do at all.
The big picture is this: the combination of the Python program plus a unique sequence of Python language
statements that we create can have the eect of creating a new application for our computer. This means
that our application uses the existing Python program as its foundation. The Python program, in turn,
depends on many other libraries and programs on your computer. The whole structure forms a kind of
technology stack, with our program on top, controlling the whole assembly.
Languages. Well look at three facets of a programming language: how you write it, what it means, and the
additional practical considerations that make a program useful. Well use these three concepts to organize
our presentation of the language. We need to separate these concepts to assure that there isnt a lot of
confusion between the real meaning and the ways we express that meaning.
The sentences Xander wrote a tone poem for chamber orchestra and The chamber orchestras tone poem
was written by Xander have the same meaning, but express it dierent ways. They have the same semantics,
but dierent syntax. For example, in one sentence the verb is wrote, in the other sentence it is was written
by : dierent forms of the verb to write. The rst form is written in active voice, and second form is called
the passive voice. Pragmatically, the rst form is slightly clearer and more easily understood.
The syntax of the Python language is covered here, and in the Python Reference Manual [PythonRef]. Python
syntax is simple, and very much like English. Well provide many examples of language syntax. Well also
provide additional tips and hints focused on the newbies and non-programmers. Also, when you install
Python, you will also install a Python Tutorial [PythonTut] that presents some aspects of the language, so
youll have at least three places to learn syntax.
The semantics of the language specify what a statement really means. Well dene the semantics of each
statement by showing what it makes the Python program do to your data. Well also be able to show
where there are alternative syntax choices that have the same meaning. In addition to semantics being
covered in this book, youll be able to read about the meaning of Python statements in the Python Reference
Manual [PythonRef], the Python Tutorial [PythonTut], and chapter two of the Python Library Reference
[PythonLib].
In this book, well try to provide you with plenty of practical advice. In addition to breaking the topic
into bite-sized pieces, well also present lots of patterns for using Python that you can apply to real-world
problems.
Extensions. Part of the Python technology stack are the extension libraries. These libraries are added
onto Python, which has the advantage of keeping the language trim and t. Software components that you
might need for specialized processing are kept separate from the core language. Plus, you can safely ignore
the components you dont need.
This means that we actually have two things to learn. First, well learn the language. After that, well look
at a few of the essential libraries. Once weve seen that, we can see how to make our own libraries, and our
own application programs.
1.3 Audience
Programming and Computer Skills. Were going to focus on programming skills, which means we have
to presume that you already have general computer skills. You should t into one of these populations.
You have good computer skills, but you want to learn to program. You are our target crew. Welcome
aboard.
You have some programming experience, and you want to learn Python. Youll nd that most of
Getting Started is something you can probably skim through. Weve provided some advanced material
that you may nd interesting.
Chapter 1. Preface
What skills will you need? How will we build up your new skills?
Skills Youll Need. This book assumes an introductory level of skill with any of the commonly-available
computer systems. Python runs on almost any computer; because of this, we call it platform-independent.
We wont presume a specic computer or operating system. Some basic skills will be required. If these are
a problem, youll need to brush up on these before going too far in this book.
Can you download and install software from the internet? You may need to do this to get the Python
distribution kit from http://www.python.org. If youve never downloaded and installed software before,
you may need some help with that skill.
Do you know how to create text les? We will address doing this using a program called IDLE, the
Python Integrated Development Environment. If you dont know how to create folders and les, or if
you have trouble nding les youve saved on your computer, youll need to expand those skills before
trying to do any programming.
Do you know some basic algebra? Some of the exercises make use of some basic algebra. A few will
compute some statistics. We shouldnt get past high-school math, and you probably dont need to
brush up too much on this.
How We Help. Newbie programmers with an interest in Python are our primary audience. We provide
specic help for you in a number of ways.
Programming is an activity that includes the language skills, but also includes design, debugging and
testing; well help you develop each of these skills.
Well address some of the under-the-hood topics in computers and computing, discussing how things
work and why they work that way. Some things that youve probably taken for granted as a user
become more important as you grow to be a programmer.
We wont go too far into software engineering and design. We need to provide some hints on how
software gets written, but this is not a book for computer professionals; its for computer amateurs
with interesting data or processing needs.
We cover a few of the most important modules to specically prevent newbie programmers from
struggling or worse reinventing the wheel with each project. We cant, however, cover too much in
a newbie book. When youre ready for more information on the various libraries, youre also ready for
a more advanced Python book.
When youve nished with this book you should be able to do the following.
Use the core language constructs: variables, statements, exceptions, functions and classes. There are
only twenty statements in the language, so this is an easy undertaking.
Use the Python collection classes to work with more than one piece of data at a time.
Use a few of the Python extension libraries. Were only going to look at libraries that help us with
nishing a polished and complete program.
A Note on Clue Absorption. Learning a programming language involves accumulating many new and
closely intertwined concepts. In our experience teaching, coaching and doing programming, there is an upper
limit on the Clue Absorption Rate. In order to keep below this limit, weve found that it helps to build up
the language as ever-expanding layers. Well start with a very tiny, easy to understand subset of statements;
to this well add concepts until weve covered the entire Python language and all of the built-in data types.
Our part of the agreement is to do things in small steps. Heres your part: you learn a language by using it.
In order for each layer to act as a foundation for the following layers, you have to let it solidify by doing small
programming exercises that exemplify the layers concepts. Learning Python is no dierent from learning
Swedish. You can read about Sweden and Swedish, but you must actually use the language to get it o the
page and into your head. Weve found that doing a number of exercises is the only way to internalize each
1.3. Audience
language concept. There is no substitute for hands-on use of Python. Youll need to follow the examples
and do the exercises. As you can probably tell from this paragraph, we cant emphasize this enough.
The big dierence between learning Python and learning Swedish is that you can immediately interact with
the Python program, doing real work in the Python language. Interacting in Swedish can more dicult.
The point of learning Swedish is to interact with people: for example, buying some kanelbulle (cinnamon
buns) for ka (snack). However, unless you live in Sweden, or have friends or neighbors who speak Swedish,
this interactive part of learning a human language is dicult. Interacting with Python only requires a
working computer, not a trip to Kiruna.
Also, your Swedish phrase-book gives you little useful guidance on how to pronounce words like sked (spoon)
or sju (seven); words which are notoriously tricky for English-speakers to get right. Python, however, is a
purely written language so you dont have subtleties of pronunciation, you only have spelling and grammar.
Line 1 creates a Python dictionary, a map from key to value. In this case, the key will be a roll, a number
from 2 to 12. The value will be a count of the number of times that roll occurred.
Line 5 assures that the rolled number exists in the dictionary. If it doesnt exist, it will default, and will be
assigned frequency count of 0.
Line 7 prints each member of the resulting dictionary.
The output from the above program will be shown as follows:
2 0.03%
3 0.06%
4 0.08%
5 0.11%
6 0.14%
7 0.17%
8 0.14%
9 0.11%
10 0.08%
11 0.06%
12 0.03%
Tool completed successfully
Chapter 1. Preface
We will use the following type styles for references to a specic Class, method(), attribute, which includes
both class variables or instance variables.
Sidebars
When we do have a digression, it will appear in a sidebar, like this.
Tip: Tips
There will be design tips, and warnings, in the material for each exercise. These reect considerations and
lessons learned that arent typically clear to starting programmers.
10
Chapter 1. Preface
CHAPTER
TWO
GETTING STARTED
Tools and Toys
This part provides some necessary background to help non-programming newbies get ready to write their own
programs. If you have good computer skills, this section may be all review. If you are new to programming,
our objective is to build up your skills by providing as complete an introduction as we can. Computing has
a lot of obscure words, and well need some consistent denitions.
In Let There Be Python: Downloading and Installing well describe how to install Python. This is mostly for
folks using Windows. Mac OS X and Linux users will nd they already have Python installed. This chapter
has the essential rst step in starting to build programs: getting our tools organized.
Well describe two typical problems that Python can help us solve in Two Minimally-Geeky Problems :
Examples of Things Best Done by Customized Software. Well provide many, many more exercises and
problems than just these two. But these are representative of the problems well tackle.
11
The other thing that the distinction between program and language means is that we will focus our eorts
on learning the language. The data processing will be completely dened by a sequence of statements in
the Python language. Learning a computer language isnt a lot dierent from learning a human language,
making our job relatively easy. Well be reading and writing Python in no time.
Well look at the concepts of software design in About Programming.
For now, however, lets look at the Python the program.
12
A script, on the other hand, is used to control a program. A script doesnt take direct control over the
processor. It doesnt rely (directly) on binary codes. The Python language is a scripting language; it
controls the computer system indirectly, via the Python binary program.
Our programs will be scripts that control the underlying Python program.
Your operating system is a complex collection of binary executables and scripts. These operating system
programs dont solve any particular problem, but they enable the computer to be used by folks who do have
a particular problem to solve.
A binary executables direct control over the processor is benecial because it gives the best speed and uses
the fewest resources. However, the cost of this control is the relative opacity of the coded instructions that
control the processor chip. The processor instruction codes are focused on the electronic switching arcana
of gates, ip-ops and registers. They are not focused on data processing at a human level. If you want to
see how complex and confusing the processor chip can be, go to Intel or AMDs web site and download the
technical specications for one of their processors.
One subtlety that we have to acknowledge is that even the binary applications dont have complete control
over the entire computer system. A computer system loads a kernel of software when it starts. The parts
we interact with are actually outside this kernel. The binary applications we use do parts of their work by
using the kernel. This important design feature of the operating system assures that all of the application
programs behave consistently and share resources politely.
13
Binary Codes
Binary codes were invented by the inhabitants of the planet Binome, the Binome Individual uniTs, or
BITs. These creates had two hands of four ngers each, giving them eight usable digits instead of the
ten that most Earthlings have. Unlike Earthlings, who use their ten ngers to count to ten, the BITs
use only their right hands and can only count to one.
If their hand is down, thats zero. If they raise their hand, thats one. They dont use their left hands
or their ngers. It seems like such a waste, but the BITs have a clever work-around
If a BIT want to count to a larger number, say ten, they recruit three friends. Four BITs can then
chose positions and count to ten with ease. The right-most position is worth 1. The next position to
the left is worth 2. The next position is worth 4, and the last position is worth 8.
The nal answer is the sum of the positions with hands in the air.
Say we have BITs named Alpha, Bravo, Charlie and Delta standing around. Alpha is in the rst
position, worth only 1, and Delta is in the fourth position, worth 8. If Alpha and Charlie raise their
hands, this is positions worth 1 and 4. The total is 5. If all four BITs raise their hands, its 8+4+2+1,
which is 15. Four BITs have 16 dierent values, from zero (all hands down) to 15 (all hands up).
Delta (8) Charlie (4) Bravo (2) Alpha (1) total
down
down
down
down
0
down
down
down
up
1
down
down
up
down
2
down
down
up
up
2+1=3
down
up
down
down
4
down
up
down
up
4+1=5
down
up
up
down
4+2=6
down
up
up
up
4+2+1=7
up
down
down
down
8
up
down
down
up
8+1=9
up
down
up
down
8 + 2 = 10
A party of eight BITs can show 256 dierent values from zero to 255. A group of thirty-two BITs can
count to over 4 billion.
The reason this scheme works is that we only have two values: on and o. This two valued (binary)
system is easy to build into electronic circuits: a component is either on or o. Internally, our processor
chip works in this binary arithmetic scheme because its fast and ecient.
15
16
17
3. The OS provides the le name you double-clicked (roulette.py) to the Python program.
4. The Python program reads the roulette.py le and executes the Python language statements it nds.
5. When the statements are nished the Python program has nothing more to do, so it terminates.
6. The OS releases the resources allocated to the Python program.
It turns out that step four can have some sub-steps to it. The Python program doesnt always do a simple
read of our le of statements. Theres room for a small optimization in this step. Under some circumstances
(see Modules : The unit of software packaging and assembly), Python will create a compiled version of our
le to save a little bit of time.
What youll observe are les with an extension of .pyc. These are compiled versions of a le. Theyre
smaller, and encoded in a way that makes them very easy to read and work with.
18
takes a closer look at what a program really is and what is means to run a program. This will lead us to
the program named python (or python.exe) and how we control it with statements in the Python language.
19
Other skills in include testing and problem analysis. Testing is a rich subject; it would double the size of
this book to talk about appropriate testing techniques. The analytical skills for inception and elaboration
dont require knowledge of Python, just common sense and clear thinking.
20
As we noted above in Goal-Directed Activities, a program is focused on the goal of successful completion.
Programming must, therefore, focus on the nal outcome, also. The dicult part is to determine two things.
Whats the last step in creating the desired successful state.
Whats the precondition for that last step.
Once weve gured out what the last thing to do is, we now have a new problem that focuses on the
next-to-last thing to do.
Generally, it helps to think backwards from desired outcome to necessary pre-conditions. At some point, we
get to a necessary condition which is so trivial that we write the rst statement and were done with the
programming eort.
21
Binary executable les are created by a program called a compiler. A compiler translates statements from
some starting language into the processors native instruction codes. This leads to blazing speed. This
approach is typied by the C language. One consequence of this is that we must recompile our C language
programs for each dierent chip set and operating system.
The C language isnt terribly easy to read. The language was designed to be relatively easy for the compiler
to read and translate. It reects an older generation of smaller, slower computers.
The GNU Tools. For the most part, the GNU C Compiler and C language libraries are used to write binary
executables like Python. The C language has been around for decades, and has evolved a widely-used style
that makes it appropriate for a variety of operating systems and processors. The GNU C compiler has been
designed so that it can be tailored for all processors currently used to build computers. Many companies
make processors, include Intel, National Semiconductor, IBM, Sun Microsystems, Hewlett-Packard, and
AMD. The GNU C Compiler can produce appropriate binary codes for all of these various processor chips.
In addition to the processor (or chip architecture), binary executables must also be specic to an operating
system. Dierent operating systems provide dierent kernel services and use dierent formats for their binary
executable les. Again, the GNU C Compiler can be made to work with a wide variety of operating systems,
producing binary executable les with all the unique features for that operating system.
The ubiquity of the GNU C compiler leads to the ubiquity of Python. By depending on the GNU C compiler,
the authors of Python assured that the python program can be compiled for any processor chip and any
operating system.
22
opped. In addition to ip-ops are logic gates to do things like determine if two ip-ops are on at
the same time ( and ) or if one of two ip-ops is on ( or ), or if a ip-op is o ( not ).
The designers of computers will often group the ip-ops into bunches and call them registers. These
register specic values or conditions within the processor. For example, one register may contain the
memory address of the next instruction to fetch. Another register might have a numeric value on which
we are calculating. Another register might be a clock that counts up automatically from zero when
the processor is turned on.
A computers memory, it turns out, is just a collection of billions of ip-ops.
The processor chip does two things: it fetches instructions from memory, and executes those instructions. The fetching part is a relatively simple process of reading data from the memory chips and
changing registers to reect that instruction. The execution part is more complex, and involves changing the state of other ip-ops based on the instruction itself, data in memory and the state of the
various processor registers.
The instructions in memory form a kind of language for controlling the processor. At this level, the
language is very primitive. Since it is narrowly focused on the ways the processor works, it is almost
incomprehensible. The language can only express a few simple imperative commands in a very precise
essentially numeric form.
The idea that computers are controlled with a kind of language is an example of an abstraction that
has immense and far-reaching consequences.
It lets us translate from more expressive languages into the machines native language. We call
this kind of translator a compiler.
It lets us design more expressive languages that better describe the problems we are trying to
solve.
It changes our view of computing. We are no longer controlling an electronic chip thingy; we are
capturing knowledge about data and processing.
Why cant programming be done in English? There are a number of reasons why we dont try to do
programming in English.
English is vague. More precisely, English has many subtle shades of meaning. Try to explain the
dierence between huge and immense . Further, English has words borrowed from a number
of languages, making it more dicult to assign precise meanings to words.
English is wordy. Data processing can be very simple; however, English is a general-purpose language. Because were only talking about data processing, it helps to have a number of simplifying
assumptions and denitions.
Over the years there have been a number of attempts at natural language processing, with varying
degrees of success. It takes quite a bit of computing horsepower to parse and understand general
English-language writing. All of this horsepower would then make the Python program large and slow;
a net loss in value.
In order to keep to short, focused statements, we would do well to use only a limited number of words.
We would also nd it handy to allow only a few of the available English sentence forms. We should
also limit ourselves to just one verb tense. By the time weve focused ourselves to a small subset of
English, weve created an articial language with only a small resemblance to English. We might as
well do another round of simplication and wind up with a language that looks like Python.
What if Im no good with languages? First, we arent learning a complete natural language like
Swedish. Were learning a small, articial language with only about twenty kinds of statements.
Second, we arent trying to do complex interpersonal exchanges like asking someone which bus will get
us to Slottberg in Gamla Stan. Interpersonal interactions are a real struggle because we dont have
all day to look up the right words in our phrase book. Python is all done as written exchanges: we
2.2. About Programming
23
have hours to look things up in our various reference books, think about the response from the Python
program, and do further research on the Internet.
Also, the Python language lacks subtle shades of meaning. It is a mathematical exercise; the meanings
are cut and dried. The meanings may be novel, but the real power of software is that it captures
knowledge in a rigorous formal structure.
Why is the terminology so confusing? One of the biggest sources of confusion is the overuse of the
word system . Almost everything related to computers seems to be a system. We have computer
systems, software systems, operating systems, systems programmers, system architects and network
systems. Most of this is just casual misuse of the words. Well limit system to describing the
computer hardware system.
Another big source of confusion is overuse of architecture and the wandering meaning of platform .
Well try to avoid these words because they arent really going to help us too much in learning Python.
However, we have software architectures and hardware architectures. The hardware architecture and
the platform are both, in essence, the processor chip and supporting electronics.
Generally, however, the biggest issue is that computers and computing involve a number of very new
concepts. These new concepts are often described by using existing words in a new sense. For example,
when we talk about computer systems being clients or servers , we arent talking about a lawyers
customers or a restaurants wait sta.
24
25
Windows Post-Installation
In your Start... menu, under All Programs, you will now have a Python 2.6 group that lists ve things:
IDLE (Python GUI)
Module Docs
Python (command line)
Python Manuals
Uninstall Python
GUI is the Graphic User Interface . Well turn to IDLE in IDLE Time : Using Tools To Be More Productive.
Important: Testing
If you select the Python (command line) menu item, youll see the Python (command line) window.
This will contain something like the following.
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
If you hit Ctrl-Z and then Enter, Python will exit. The basic Python program works. You can skip to the
next chapter to start using Python.
If you select the Python Manuals menu item, this will open a Microsoft Help reader that will show the
complete Python documentation library.
26
27
Extras
IDLE
PythonLauncher
Update Shell Prole.command
Once youve nished installation, you should check to be sure that everything is working correctly.
Important: Testing
Now you can go to your Applications folder, and double click the IDLE application. This will open two
windows, the Python Shell window is what we need, but it is buried under a Console window.
Heres what youll see in the Python Shell window.
Python 2.6.6 (r266:84374, Aug 31 2010, 11:00:51)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "copyright", "credits" or "license()" for more information.
****************************************************************
Personal firewall software may warn about the connection IDLE
makes to its subprocess using this computer's internal loopback
interface. This connection is not visible on any external
interface and no data is sent to or received from the Internet.
****************************************************************
IDLE 2.6.6
>>>
At the top of the window, youll see a menu named IDLE with the menu item Quit IDLE. Use this to
nish using IDLE for now, and skip to the next chapter.
You may notice a Help menu. This has the Python Docs menu item, which you can access through the
menu or by hitting F1. This will launch Safari to show you the Python documents that you also downloaded
and installed.
You can do any one of the following alternatives to make IDLE available without a complete installation.
Dont do all of them.
1. Move the idle icons.
This is probably the simplest aproach.
28
Its beyond the scope of this book to address the various tools that can edit les like your
~/.bash_profile.
Now you can type idle & at the Terminal prompt and run IDLE. Youre ready to move to the next
chapter.
Heres what you see when there is a properly installed, but out-of-date Python on your GNU/Linux box.
slott% env python
Python 2.3.5 (#1, Mar 20 2005, 20:38:20)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1809)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ^D
In this case, the version number is 2.3.5, which is good, but we need to install an upgrade.
Note that we typed Ctrl-D to nish using Python.
Unix is not Linux. For non-Linux commercial Unix installations (Solaris, AIX, HP/UX, etc.), check
with your vendor (Sun, IBM, HP, etc.) It is very likely that they have an extensive collection of open source
projects like Python pre-built for your UNIX variant. Getting a pre-built kit from your operating system
vendor is the best way to install Python.
29
The rst command will upgrade Python to the latest and greatest version.
The second command will assure that the extension package named tkinter is part of your Fedora installation. It is not, typically, provided automatically. Youll need this to make use of the IDLE program used
extensively in later chapters.
Odds. If the outcome you bet on is likely, your payout is rather small. If the outcome you bet on is rare,
your payout may be huge. They call this the odds of winning. When the odds are small, the event is pretty
likely. For example, almost half the Roulette wheel has numbers colored red. Betting on red, then, is pretty
safe. Since its about half the numbers, the payout is 1:1. If you bet $10, you could win an additional $10.
Contrast red (or black) with the number zero, which is just one of the thirty eight bins on the wheel. Since
zero is so rare, it pays o at 35:1. If you bet $10 on zero, and it comes up, you could win $350. They call
these long odds or a long shot.
31
We can ask a whole family of related questions by replacing the Martingale betting system with more complex
systems. We can ask questions based on extending the Martingale system to include additional bets. This
is the beauty of writing our own simulation: we can modify our program to try out dierent variations on
our betting procedure.
32
betting system for the eld by increasing their bet with each win and decreasing it with each loss. Well
stick with a simple simulation as a way to learn Python
2.4.4 Directions
We arent going to describe the solutions to any of these casino game problems here that would rob you of
the intellectual fun of working out your own solutions to these problems. Instead, we want to provide some
hints and nudges that will parallel the course this book will take.
This may already be obvious, but were going to address these problems by writing new software in the
Python language. The reason why it is important to restate the (potentially) obvious is that in Using
Python were going to spend time on learning to control the python program in a simple, manual way.
Then, when we write programs, well control python with our programs to do more sophisticated work.
Any solution to these kinds of problems will involve some simple math. Almost all computing involves some
kind of math. Business programming tends to involve the simplest math. Engineering and science can
involve some really complex math. Statistics is often in the middle ground, which is why we will look at it
closely in Arithmetic and Expressions.
By the way, in addition to math-oriented computing, there is also computing that could be termed symbolic
in nature. It might involves words or XML documents or things that arent obviously mathematical; well
set this aside as atypical for newbies.
Sequential Thinking. A program in Python is often a sequence of operations. In the casino game
denitions, we saw that each game was a sequence of individual steps. We can often summarize programs by
looking at their inputs, their processing steps and their outputs. This input-process-output model reects
the sequential order of processing: rst, read the inputs; second, do the processing; third, print the outputs.
More sophisticated programs (like games or web servers) will interleave these operations. Well look at this
in Programming Essentials.
The sequence of operations is rarely xed and immutable. With casino games, we have some bets which
are winners and some bets which are losers. We have conditional operations of collecting losing bets and
paying winning bets. Additionally, well have some operations which have to be repeated for a number of
simulations, or until some condition is satised. Well look at this in Some Self-Control.
Our exploration of Python starts with arithmetic expressions and moves on to statements, then to sequences
of statements. Well add conditional and iterative statements. The next step will be a simple organizing
principle called a function denition. Well introduce this in Organizing Programs with Function Denitions
and use it to package parts of our program until a useful, discrete components that can help us control the
overall complexity of our program.
Other Side Of the Coin. Beginning with Getting Our Bearings well turn to a dierent tack. The rst
parts of our exploration were focused on the processing, and the procedural nature of our problems. The
second part of our exploration will look at the data and collections of data.
If we are going to simulate a number of sessions at the Roulette wheel, following our Martingale strategy,
well need to collect the results and do statistical analysis on the collection. Well look at collections of data
items in Basic Sequential Collections of Data.
Well address some programming techniques in Additional Processing Control Patterns that make our Python
programs more reliable and also a bit simpler. Simplication is a touchy subject: simplications arent always
appreciated until you see the more complex alternative. Further, since were approaching Python by moving
from the elementary to the advanced, some things well look at will be complex but elementary. As we learn
more, we can replace them with something simple but advanced.
In More Data Collections well look at some additional data structures that can help us develop truly useful
solutions to our problems. These additional data structures will give us foundational knowledge of the
Python language and the built-in data types that we can use.
2.4. Two Minimally-Geeky Problems : Examples of Things Best Done by Customized Software
33
Successful Collaboration. When we look at our problems, we see that there is considerable interaction
among a number of objects. For example, in Roulette, we have the following kinds of things:
the wheel, which returns a random bin,
the table, which holds bets,
the player, which uses the Martingale strategy to place bets
This interaction between player, table and wheel forms a larger thing, called the game, which lasts until the
player wins big, loses big, or has spent too much time at the table. Each game produces a nal result of
zero dollars, big bucks or some number of dollars that was available when time ran out. These, in turn are
collected for statistical analysis. An even bigger assembly of objects does the simulation and analysis. Well
learn how to dene these collaborating objects in Data + Processing = Objects.
A lot of the basic components that make a program robust and reliable are already packaged as Python
modules, and well cover these in Modules : The unit of software packaging and assembly. Well also use the
built-in modules as templates for designing our own modules; this allows us to organize our program neatly
into discrete, easy-to-manage pieces.
Our nal section, Fit and Finish: Complete Programs, will cover some nal issues. These are the things that
separate a fragile mess that almost works most of the time from a useful program that can be trusted.
34
35
On Simplicity
The simplicity of Python is so important that were going to emphasize it heavily. In other languages,
desirable features were often added as new statements in the language. The language then evolved
into a complicated mixture of optional extensions and operating-system features muddled up with the
original core statements of the language. A poorly designed language rarely works the same on dierent
computers or operating systems, or it requires many compromises to achieve portability. This kind of
badly designed language is always hard to learn.
One hint that a language has too many features is that a language subset is available. The most
outstanding example of this is COBOL. There are a number of subsets with dierent kinds of compatibilities with dierent tools and operating systems. While originally easy-to-read, COBOL has evolved
into a monstrously complex problem for many businesses.
The Python language has only twenty statements, the language is easy to learn, and there is no need
to create a simplied language subset.
Interpreted. The computer science folks characterize the Python program, python , as an interpreter
: it interprets and executes the Python language statements, doing your data processing. Because it is
interpreting our statements, it can provide useful diagnostic information when something goes wrong. It
can also handle all of the tedious housekeeping that is part of how programs make use of the computers
resources. As users, we dont see this housekeeping going on, and as newbie programmers we shouldnt have
to cope with it, either.
The computer-science types make a distinction between interpreters (like Python) and compilers (used for
the C language). The C compiler (controlled by a program named cc ) translates C language statements
into a program that uses the hardware-specic internal codes used by your computer. The operating system
can then directly execute that resulting program. After you see the results of execution you might make
changes, recompile and re-execute. This compilation step makes everything you do somewhat indirect. The
compiler translates your C statements into another language which is then executed. This indirection makes
compiled languages harder to learn; it also makes diagnosing a problem very hard.
Heres a diagram that may help clarify how Python diers from a language like C. For a C programmer,
they will use a complex IDE which includes the C Compiler to translate their C statements into a binary
executable program from their statements. For a Python programmer, a simpler IDE uses the python
program to execute the Python statements.
36
Technical Digression
The Python interpreter, which runs Python-language programs, is implemented in the C programming
language and relies on the extensive, well understood, portable C libraries. Using the C-language under
the hood means that it ts seamlessly with Unix, GNU/Linux and POSIX environments. Since these
standard C libraries are widely available for the various MS-Windows variants, Python runs similarly in
just about all computers and operating systems. Because of the abstraction created by the C libraries,
youll nd it impossible to nd meaningful dierences between Windows-2000, Windows-XP, Red Hat
GNU/Linux and MacOS.
Why does anyone use a compiled language like C? C is more complex than Python and writing C
requires the programmer to keep careful track of a number of housekeeping details. The program that
results from the C compiler is hardware-specic and consequently very fast. This is the key to why
Python helps us out so much. The Python program, having been written in C, and compiled to be
specic to our computers hardware, is very ecient. However, since we can express our data processing
needs in the (easy to learn) Python language we can use all this speed without having to learn C or
how to compile C-language statements into a program.
When we need blazing speed, we have to write in C. When we need simplicity, we nd it easier to write
in Python. We can have the best of both worlds. Most programs only need amazing performance in
small sections of the program. We can, with some care, write just those small sections in C, and then
make that component available to Python. This gives us the speed of C where we need it and the
simplicity of Python everywhere else.
It turns out that Python often does a secret compilation pass on your Python statements in order to
speed things up a hair. It doesnt change the fundamental benet that accrues because Python is a
kind of interpreter. It only blurs the distinction between compiled and interpreted languages.
Libraries. Python, the project, includes a rich set of supporting libraries. These libraries contain the basic
gears, sprockets, ywheels and drive-shafts that you can use to make a program. By separating the library
tool-boxes from the core language, the designers of Python could keep the language simple, which means
the interpreter can be very ecient and reliable. Yet, they can provide an extensive feature set as separate
extensions. Every new idea can be added as another extension.
There are other consequences to having extensive and separate libraries. Principally, good ideas can be
preserved and extended, and bad ideas can be ignored. This basic evolution saves programmers from having
to design everything perfectly the rst time. As you get more experience with the Python programming
community, you will see ideas come and go. Some extensions will blossom and become widely used, where
others will be quietly ignored because something better has come along.
Another consequence of having separate libraries is that any programming project should begin with a survey
of available libraries. This can replace unproductive programming with more productive research and reuse.
Development Environment. Finally, we see that Python also comes with a development environment,
or workbench, that you can use to write and execute your Python statements. The integrated development
environment ( IDE ) includes an editor for writing Python les, and the Python interpreter, plus some other
tools for searching the Python libraries.
Interestingly, the Python development environment is just another Python program. When you double-click
on the IDLE icon, you are starting a Python program that helps you write Python programs. At rst, this
seems like a real mind-wrenching problem. You might think of it as similar to asking which came rst, the
chicken or the egg? . It isnt all that bad a problem however. In this case, someone else wrote IDLE to
help you write your program. Your program, and IDLE (and a large number of other programs) all share
the Python program as the driving engine.
Timeline. The Python programming language was created in 1991 by Guido van Rossum based on lessons
learned doing language and operating system support. Python is built from concepts in the ABC language and Modula-3. For information ABC, see The ABC Programmers Handbook [Geurts91], as well
37
38
CHAPTER
THREE
USING PYTHON
Taking Your First Steps
Now that you have Python installed, we can start using it. Well look at a number of ways that we can
interact with the Python application. Well use these interactions to learn the language.
In later sections, after weve got a more complete grip on the language and start to write programs, well
move on to more advanced ways to use the Python program. Our goal is to use Python in an automated
fashion to do data processing. Here in phase one, well be using Python manually to learn the language.
Well describe the direct use of the python to process Python-language statements in Instant Gratication
: The Simplest Possible Conversation. This will help us get started; it provides immediate gratication, but
isnt the easiest way to work.
Well dig into IDLE in IDLE Time : Using Tools To Be More Productive. Well emphasize this as a good
way to learn the language as well as build programs.
39
40
Click on the value and use the right arrow key to scroll through the value you nd. At the end, add
the following ;C:\python26. Dont forget the ; to separate this search location from other search
locations on the path.
Click OK to save this change. It is now a permanent part of your Windows setup on this computer.
Youll never have to change this again.
7. Finish Changing Your System Properties
The current dialog box has a title of Environment Variables. Click OK to save your changes.
The current dialog box has a title of System Properties. Click OK to save your changes.
When we get the >>> prompt, the Python interpreter is listening to us. We can type any Python statements
we want. Each complete statement is executed when it is entered.
We can ask Python to stop by using the exit() function. We enter exit() at the >>> prompt.
We can also enter the end-of-le character. This varies slightly from operating system to operating system.
MacOS and GNU/Linux. The polite way to tell Python that were done is to enter Ctrl-D.
Windows. The polite way to nish a conversation with Python is Ctrl-Z, followed by Enter.
41
1. The shell prompted me with MacBook-5:~ slott$. I typed python to start the python program
running.
2. Python provided some information on itself.
5. Python prompted me with >>>. I typed 351 / 18 to compute miles per gallon I got driving to Newark
and back home. This is a complete Python statement, and Python will evaluate that statement.
6. Python responded with 19: a rotten 19 miles per gallon. Ive got to get a new car that uses less
gasoline.
This shows Python doing simple integer arithmetic. There were no fractions or decimal places involved.
When I entered 351 / 18 and then hit Return, the Python interpreter evaluated this statement. Since the
value not None, Python printed the results.
The usual assumption for numbers is that they are integers, sometimes called whole numbers.
42
Note that Python does not like , in numbers. Outside Python, we write large numbers with , to break the
numbers up for easy reading. (The exception is the calendar year, where we omit the ,: we write 2007, not
2,007.) Python cant cope with , in the middle of numbers. The mileage on my odometer reads 19,241.
But, in Python we write this as 19241.
Bottom Line. For now, be comfortable that Python is perfectly happy with whole numbers. Remember to
avoid commas. We sometimes call these numbers ints, short for integers. Later, well see that Python has a
pretty expansive set of numbers available to work with.
1. I typed 351. / 18. to compute miles per gallon I got driving to Newark and back home.
2. Python responded with 19.5: the more accurate 19.5 miles per gallon.
Floating-point isnt adequate for everything, so theres another kind of number that well get to later. When
we do nancial calculations on US dollars, the decimal point is xed; we have two digits to the right of the
decimal point and no more. These xed-decimal point numbers arent a built-in feature of Python, but there
are ways to extend Python with a library that gives us this capability.
Bottom Line. For now, be comfortable that Python is perfectly happy with oating-point numbers that
have about 17 total digits of accuracy, but a range that is huge. Remember to include a decimal point to tell
Python that you want to see decimal places in the calculation. Also, remember to avoid commas, theyre
just confusing.
43
Additionally, Python (and many other programming languages) provide two handy operators that mathematicians dont normally write down in this form. Mathematicians may talk about modular arithmetic,
with something like a mod m. This is written in Python using the % character. For non-mathematicians,
this is the remainder after division.
>>> 355 % 113
16
>>> 355 / 113
3
Heres what this shows us: 113 goes into 355 with 16 left over. Mathematically, 355 = 3 113 + 16.
Well look at all of these operators closely in Simple Arithmetic : Numbers and Operators.
We have to put the 65-32 in parenthesis so that it is done before the multiply and divide. Also, youll note
that when one number is oating point (9.) it forces the calculation to be done as oating-point.
What would happen if we said 65-32*5/9? Try it rst, to see what happens.
If we dont include the () for grouping, then Python would do what every mathematician would do: compute
32*5/9 rst and then the dierence between that and 65. Python did what we said, but not what we meant.
We know the answer is wrong because 65 Fahrenheit cant be the impossibly hot 48 Celsius.
In the second example, we put in extra () that dont change the resulting answer.
44
This leads us to the rst of many syntax rules. Well present them in order of relevance to what were doing.
That means that were going to skip over some syntax rules that dont apply to our situation.
Important: Syntax Rule One
Statements must be complete on a single line. If the statement is incomplete, youll get a SyntaxError
response.
Just to be complete, well present syntax rule two, but it doesnt really have much impact on what were
going to be doing.
Important: Syntax Rule Two
The invisible end-of-line character is slightly dierent on dierent platforms. On Windows it is actually two
non-printing characters, where on GNU/Linux and MacOS it is a single non-printing character. You may
notice this when moving les back and forth between operating systems.
There is an escape clause that applies to rule one (one statement one line.) When the parenthesis are
incomplete, Python will allow the statement to run on to multiple lines.
>>> ( 65 - 32
... ) * 5 / 9
18
>>>
This is called an escape and it allows you to break up an extremely long statement. It creates an escape
from the usual meaning of the standard meaning of the end-of-line character; the end-of-line is demoted to
just another whitespace character, and loses its meaning of end-of-statement, commence execution now.
45
Using \ at the end of a line escapes the meaning of the enter key, and allows a statement go continue onto
multiple lines. While legal, this isnt the best policy, and were going to avoid doing this.
If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://docs.python.org/tutorial/.
Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules. To quit this help utility and
return to the interpreter, just type "quit".
To get a list of available modules, keywords, or topics, type "modules",
"keywords", or "topics". Each module also comes with a one-line summary
of what it does; to list the modules whose summaries contain a given word
such as "spam", type "modules spam".
help>
46
You can see that the prompt is now help>. To go back to ordinary Python programming mode, enter quit.
help> quit
You are now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)". Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.
>>>
Youll get a page of output, ending with a special prompt from the program thats helping to display the
help messages. The prompt varies: Mac OS and GNU/Linux will show one prompt, Windows will show
another.
Mac OS and GNU/Linux. In standard OSs, youre interacting with a program named less; it will
prompt you with : for all but the last page of your document. For the last page it will prompt you with
(END).
This program is very sophisticated. The four most important commands you need to know are the following.
q Quit the less help viewer.
h Get help on all the commands which are available.
Enter a space to see the next page.
b Go back one page.
Windows. In Windows, youre interacting with a program named more; it will prompt you with -- More
--. The four important commands youll need to know are the following.
q Quit the more help viewer.
h Get help on all the commands which are available.
Enter a space to see the next page.
47
355 / 113
48
The LEO editor can be used to create complex programs. It is a literate programming editor with
outlines. LEO isnt as easy for newbies to use because it is focused on experts. LEO is written in
Python, also.
Less Ideal. From the Terminal prompt, you can type the following command to start IDLE .
env python /usr/lib/python2.6/idlelib/idle.py &
Yes, this is long. There are some ways to shorten this up. Well cover some of them because they tell us a
lot about how GNU/Linux really works. You only have to do one of these. Pick the one method that seems
simplest to you and ignore the others.
Write a script. This is a short le that becomes a new Linux command.
Update your PATH setting. This is a change to your environment that makes the idle.py le usable
by your shell.
Create an alias. This is a change to your environment that creates a new Linux command.
Create a link. This adjusts the le system so that idle appears to be in your home directory. This is a
bit risky because your le system may not be organized the same as mine, meaning my example may
not work for you.
3.2. IDLE Time : Using Tools To Be More Productive
49
Write a Script. To create a script, youll put a command in a le, and mark that le as executable. Once
youve done these two steps, youve eectively added a new command to your GNU/Linux environment.
1. Use an editor (I like gedit) to create a le named idle. Put this above into that le as the only line.
Save the le into your home directory.
env python /usr/lib/python2.6/idlelib/idle.py &
2. Log out. That way, when you log in again, your .profile is executed.
Now you can type idle.py to run the IDLE program.
Create an Alias. To create an alias, you have to make sure that the alias command is executed every time
you log in.
Most shells, it turns out, read a hidden le named .profile every time you log in to GNU/Linux. The
bash shell reads .bash_profile . Theres a two step process to creating an alias. Once youve done these
two steps, youve congured your shell environment.
1. Use an editor (I like gedit) to update your .profile or .bash_profile le.
You wont see this le in ordinary directory listings; the . in the name means that its hidden; use ls
-a to see all les. Insert the following line at the very end. Note that the apostrophes are essential to
making this work.
alias idle='env python /usr/lib/python2.6/idlelib/idle.py &'
2. Log out. That way, when you log in again, your .profile is executed.
Now you can type idle to run the IDLE program. This is a handy technique, but we dont want to go
overboard creating too many aliases.
50
If you have personal rewall software and it does warn you about IDLE, you can ignore your personal
rewalls messages. Your rewall is detecting ordinary activity called interprocess communication among
the various components of IDLE. Rather than a personal rewall, I buy routers that do this for all the
computers in my home.
You can use the File menu, item Exit to exit from IDLE. You can also close the window by clicking on the
close icon.
Interesting. When the Stockholm weather says -20 Celsius, that is -4 Fahrenheit. Thats cold.
Drat! We used numbers without any decimal points. That means we used integer division, which wont be
very accurate. Wed like to try that statement again without having to retype the entire thing from scratch.
3.2. IDLE Time : Using Tools To Be More Productive
51
52
We can interact directly with Python at the command-line. This was what we saw in Instant
Gratication : The Simplest Possible Conversation. This is available because Python is must usable
when it is a shell program.
A tool like IDLE makes it easier to enter Python statements and execute them. IDLE shows us
a Python Shell window, which is eectively the command-line interaction with Python, plus oers a
handy text editor as a bonus. IDLE is both written in Python and uses Python as a shell program.
A tool like BBEdit or TextPad is a handy text editor that can execute the Python command-line
tool for us. This interaction is made possible because under the hood, Python is a command-line
program with the ultra simple character-oriented command-line interface.
Why all the colors? Can I turn that o? Some newbies nd syntax coloring distracting. Most experienced programmers nd it very handy because the colors provide immediate feedback that the syntax
of the statement is sensible.
If you want to change or disable the syntax coloring, use the Options Congure IDLE... to provide
dierent syntax coloring.
53
54
CHAPTER
FOUR
The heart of Python is a rich variety of numeric types and arithmetic operators. We can use these various
numeric types to do basic mathematical operations on whole numbers, real numbers and complex numbers.
Well look at the basics in Simple Arithmetic : Numbers and Operators.
In addition to the basic arithmetic capabilities, many kinds of problems need additional mathematical and
nancial functions. Well look at some of the built-in functions and some functions in add-on modules in
Better Arithmetic Through Functions and Extra Functions: math and random.
For more specialized problems, Python has a variety of additional operators. Well look more deeply at these
additional operators in Special Ops : Binary Data and Operators.
Well cover some optional topics in More Advanced Expression Topics, including dierent approaches to
execution of Python statements, some notes on Python writing style.
55
Pay close attention to 42 * 19 + 21 / 6. In particular, remember that your desktop calculator may say that
21 6 = 3.5. However, since these are all integer values, Python uses integer division, discarding fractions
and remainders. 21/6 is precisely 3.
Does Python Round? Try this to see if Python rounds. If Python does not round, the answers will all
be 2. If Python does round, the answers will be 2, 2, 3 and 3.
56
8 / 4
9 / 4
10 / 4
11 / 4
What happened? It shouldnt be any surprise that integer arithmetic is done very simply. For more sophistication, well have to use oating-point numbers and complex numbers, which well look at in later
sections.
New Syntax: Functions. More sophisticated math is separated into the math module, which we will look
at in The math Module Trig and Logs. Before we get to those advanced functions, well look at a few
less-sophisticated functions.
The absolute value (sometimes called the magnitude or absolute magnitude) operation is done using a
slightly dierent syntax than the conventional mathematical operators like + and - that we saw above.
A mathematician would write |n|, but this can be cumbersome for computers. Instead of copying the
mathematical notation, Python uses a kind of syntax that we call prex notation. In this case, the operation
is a prex to the operands.
Here are some examples using the abs() function.
>>> abs(-18)
18
>>> abs(6*7)
42
>>> abs(10-28/2)
4
The expression inside the parenthesis is evaluated rst. In the last example, the evaluation of 10-28/2 is -4.
Then the abs() function is applied to -4. The evaluation of abs(-4) is 4.
Heres the formal Python denition for the absolute value function.
abs(number) number
Returns the absolute value of number.
For non-numeric arguments, raises a TypeError.
This tells us that abs() has one parameter that must be a numeric value and it returns a numeric value. It
wont work with strings or sequences or any of the other Python data types well cover in Basic Sequential
Collections of Data.
57
3.1415926
867.5309
-42.0
We can, if we want, write our numbers in scientic notation. A scientist might write 6.022 1023 . In Python,
they use the letter E or e instead of 10. Here are some examples.
6.022e23
1.6726e-27
8.675309e3
2.998e8
All of the arithmetic operators we saw in Plain Integers, Also Known As Whole Numbers also apply to
oating-point numbers. Here are a couple of examples.
>>> 6.62e-34 * 2.99e8
1.97938e-25
>>> 3.1415926 * 3.5**2
38.484509350000003
You Call That Accurate? What is going on with that last example? What is that 0000003 hanging o
the end of the answer?
That tiny, tiny error amount is the dierence between the decimal (base 10) display and the binary (base
2) internal representation of the numbers. That tiny, annoying error can be made invisible when we look
at formatting our output in Sequences of Characters : str and Unicode. For now, however, well leave this
alone until we have a few more Python skills under our belt.
One consequence is that some fractions are spot-on, while others involve an approximation. Anything that
involves halves, quarters, eighths, etc., will be represented precisely. 3.1 has to be approximated, where 3.25
is something that Python handles exactly.
Important: Mixing Numbers
When you mix numbers, as in 2 + 3.14159, Python coerces the integer value to a oating-point value. This
assures that you never lose any information. It also means that you dont have to meticulously check every
number in a statement to be sure that they are all oating-point. As long as some numbers are oating-point,
the others will likely get promoted properly.
The coercion rules are done for each individual operation. 2+3/4.0 and 2.0+3/4 will do dierent things.
Well return to this below.
Scientic Notation. Floating point numbers are stored internally using a fraction and an exponent, in a
style some textbooks call scientic notation. Usual scientic notation uses a power of 10. In the Python
language, we write the numbers as if we were using a power of 10. We think of a number like 123000 as
1.23e5. Mathematically, it means the following,
n = 1.23 105
While the Python language allows us to enter our numbers in good-old decimal, our computer doesnt use
base 10, it uses base 2. Really, our oating point numbers are converted to the following form.
n = b 2b
Specically, it becomes this inside the computers hardware:
n = 0.93841552734375 217
58
This emphasizes how two conversions between the value of 1.23 (as entered in base 10) to 0.938... blah
blah blah (in base 2) and then back to base 10 to display it for human consumption reveals tiny dierences
in how a decimal fraction is approximated by a binary fraction.
One important consequence of this is the need to do some algebra before simply translating a formula into
Python. Specically, if we subtract two nearly-equal oating point numbers, were going to magnify the
importance of the stray error bits that are artifacts of conversion.
59
>>> 2**32
4294967296L
There are about 4 billion ways to arrange 32 bits. How many bits in 1K of memory? 1024 8 bits. How
many combinations of bits are possible in 1K of memory?
2**(1024*8)
I wont attempt to reproduce the output from Python. It has 2,467 digits. There are a lot of dierent
combinations of bits in only 1K of memory. The computer Im using has 256 1024 K of memory; there are
a lot of combinations of bits available in that memory.
Important: Mixing Numbers
When you mix numbers, as in 2 + 3L, Python coerces the integer value to a long value. This assures that
you never loose any information. If you mix long and oating-point numbers, as in 3.14 + 123L, the long
number is converted to oating-point.
In the rst example, the rst expression to be evaluated (3/4.0) involves coercing 3 to 3.0, with a result of
0.75. Then the 2 is coerced to 2.0 and the two values added to get 2.75.
In the second example, the rst expression to be evaluated (3/4) is done as integer values, with a result of
0. Then this is coerced to 0.0 and added to 2.0 to get 2.0.
As well see in Functions are Factories (really!) we can force specic conversions if Pythons automatic
conversions arent appropriate for our problem.
60
61
Well use strings more heavily in Seeing Results : The print Statement. It turns out that strings are actually
very sophisticated objects, so well defer exploring them in depth until Sequences of Characters : str and
Unicode.
Note: Adjacent String Literals
As a special case, Python will automatically concatenate adjacent string literals. This only works for quoted
strings, but sometimes youll see programs that look like this.
big_string = "First part of the message, " \
"second part of the message." \
"The end of the message."
Remember from Syntax Rule 5 that the \ extends the statement to the next line. This statement is three
adjacent string literals. Python will concatenate these three strings to make one long message.
An attempt to use digits 8 and 9 in an octal number is illegal; In base 8, we only have the digits 0 to 7. It
doesnt make sense to try and use 8 and 9 in an octal value.
Consequently, theres a strange looking error message if you do try.
62
>>> 09
File "<stdin>", line 1
09
^
SyntaxError: invalid token
In the obscure parlance of language parsing, any symbol, including a number is a token. In this case, the
token could not be parsed because it began with a zero, but it did not continue with digits between 0 and
7. It isnt a proper numeric token.
Tip: Debugging Octal Numbers (Leading Zero Alert)
A number that begins with a zero is supposed to be in base 8. If you are copying numbers from another
source, and that other uses leading zeros, you may be surprised by what Python does. If the number has
digits of 8 or 9, its illegal. Otherwise, the number isnt decimal.
I spent a few hours debugging a program where I had done exactly this. I was converting a very ancient
piece of software, and some of the numbers had zeroed slapped on the front to make them all line up nicely.
I typed them into Python without thinking that the leading zero meant it was really base 8 not base 10.
Base 16 Hexadecimal. A number with a leading 0x or 0X is hexadecimal, base 16. In order to count
in base 16, well need 16 distinct digits. Sadly, our alphabet only provides us with ten digits: 0 through 9.
The computer folks have solved this by using the letters a-f (or A-F) as the missing 6 digits. This gives us
the following way to count in base 16: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, 10, 11, 12, 13, 14, etc.
In math textbooks, they sometimes write this: 5316 to indicate that its a value using base 16. We gure
out the value by applying base 16 place values of 1, 16, 256, etc. So we have 5316 = 5 16 + 3 = 8310 .
In Python, we use 0x as a prex.
Here are some examples of hexadecimal numbers:
0x0
0x123
-0xcbb2
0xbead
When you enter one of these numbers, Python evaluates it as an expression, and responds in base 10.
>>> 0x53
83
>>> 0x1ff
511
>>> 0xffcc33
16763955
Hex or octal notation can be used for long numbers. 0x234C678D098BAL, for example is 620976988526778L.
63
355/113 * ( 1 - 0.0003/3522 )
22/17 + 37/47 + 88/83
(553/312)**2
2. Stock Value.
Compute value from number of shares price for a stock.
Once upon a time, stock prices were quoted in fractions of a dollar, instead of dollars and cents. Create
a simple expression for 125 shares purchased at 3 and 3/8. Create a second simple print statement for
150 shares purchased at 2 1/4 plus an additional 75 shares purchased at 1 7/8.
Dont manually convert 1/4 to 0.25. Use a complete expression of the form 2+1/4.0, just to get more
practice writing expressions.
3. Convert Between C and F.
Convert temperature from one system to another.
Conversion Constants: 32 F = 0 C, 212 F = 100 C.
The following two formulae converts between C (Celsius) and F (Fahrenheit).
212 32
C
100
100
C = (F 32)
212 32
F = 32 +
rp(r + 1)n
(r + 1)n 1
rp(r + 1)n
[(r + 1)n 1](r + 1)
Use any of these forms to compute the mortgage payment, m, due with a principal, p, of $110,000,
an interest rate, r, of 7.25% annually, and payments, n, of 30 years. Note that banks actually process
things monthly. So youll have to divide the interest rate by 12 and multiply the number of payments
by 12.
5. Surface Air Consumption Rate.
Surface Air Consumption Rate (SACR) is used by SCUBA divers to predict air used at a particular depth.
64
For each dive, we convert our air consumption at that dives depth to a normalized air consumption at the surface. Given depth (in feet), d, starting tank pressure (psi), s, nal tank pressure
(psi), f, and time (in minutes) of t, the SACR, c, is given by the following formula.
c=
33(s f )
t(d + 33)
Typical values for pressure are a starting pressure of 3000, nal pressure of 500.
A medium dive might have a depth of 60 feet, time of 60 minutes.
A deeper dive might be to 100 feet for 15 minutes.
A shallower dive might be 30 feet for 60 minutes, but the ending pressure might be 1500. A
typical c (consumption) value might be 12 to 18 for most people.
Write expressions for each of the three dive proles given above: medium, deep and shallow.
Given the SACR, c, and a tank starting pressure, s, and nal pressure, f, we can plan a dive to
depth (in feet), d, for time (in minutes), t, using the following formula. Usually the 33(s f )/c
is a constant, based on your SACR and tanks.
33(s f )
= t(d + 33)
c
For example, tanks you own might have a starting pressure of 2500 and ending pressure of 500,
you might have a c (SACR) of 15.2. You can then nd possible combinations of time and depth
which you can comfortably dive.
Write two expressions that show how long one can dive at 60 feet and 70 feet.
1. Wind Chill. Used by meteorologists to describe the eect of cold and wind combined.
Given the wind speed in miles per hour, V, and the temperature in F, T, the Wind Chill, w, is given
by the formula below.
Wind Chill, new model
35.74 + 0.6215 T 35.75 (V 0.16 ) + 0.4275 T (V 0.16 )
Wind Chill, old model
0.081 (3.71
Wind speeds are for 0 to 40 mph, above 40, the dierence in wind speed doesnt have much practical
impact on how cold you feel.
You can do square root of a given wind speed, V, using an expression like V ** 0.5. For example, a
20 mph wind would use 20 ** 0.5 in the formula.
Write an expression to compute the wind chill felt when it is -2 F and the wind is blowing 15 miles
per hour.
2. Force on a Sail.
How much force is on a sail?
A sail moves a boat by transferring force to its mountings. The sail in the front (the jib) of a typical
fore-and-aft rigged sailboat hangs from a stay. The sail in the back (the main) hangs from the mast.
The forces on the stay (or mast) and sheets move the boat. The sheets are attached to the clew of the
sail.
The force on a sail, f, is based on sail area, a (in square feet) and wind speed, w (in miles per hour).
f = w2 0.004 a
4.1. Simple Arithmetic : Numbers and Operators
65
For a small racing dinghy, the smaller sail in the front might have 61 square feet of surface. The larger,
mail sail, might have 114 square feet.
Write an expression to gure the force generated by a 61 square foot sail in 15 miles an hour of wind.
3. Craps Odds. What are the odds of winning on the rst throw of the dice?
There are 36 possible rolls on 2 dice that add up to values from 2 to 12. There is just 1 way to roll a
2, 6 ways to roll a 7, and 1 way to roll a 12. Well take this as given until a later exercise where we
have enough Python to generate this information.
Without spending a lot of time on probability theory, there are two basic rules well use time and
again. If any one of multiple alternate conditions needs to be true, usually expressed as or, we add
the probabilities. When there are several conditions that must all be true, usually expressed as and,
we multiply the probabilities.
Rolling a 3, for instance, is rolling a 1-2 or rolling a 2-1. We add the probabilities: 1/36 + 1/36 =
2/36 = 1/18.
On a come out roll, we win immediately if 7 or 11 is rolled. There are two ways to roll 11 (2/36) or 6
ways to roll 7 (6/36).
Write an expression to print the odds of winning on the come out roll. This means rolling 7 or rolling
11. Express this as a fraction, not as a decimal number; that means adding up the numerator of each
number and leaving the denominator as 36.
4. Roulette Odds.
How close are payouts and the odds?
An American (double zero) Roulette wheel has numbers 1-36, 0 and 00. 18 of the 36 numbers are red,
18 are black and the zeros are green. The odds of spinning red, then are 18/38. The odds of zero or
double zero are 2/36.
Red pays 2 to 1, the real odds are 38/18.
Write an expression that shows the dierence between the pay out and the real odds.
You can place a bet on 0, 00, 1, 2 and 3. This bet pays 6 to 1. The real odds are 5/36.
Write an expression that shows the dierence between the pay out and the real odds.
66
We can then use an eighth bit to carry a primitive error-detection code. We can insist that each valid
character code have an even number of bits switched on. If we receive a character with an odd number
of bits, we know that a bit got garbled. This is one of the many historical precedents that made 8-bit
bytes appealing.
Also, of course, there is an elegant symmetry to using 8-bit bytes when we are using binary number
coding. The powers of two that we use for binary number positions are 1, 2, 4, 8, 16, 32, 64 and 128.
This sequence of numbers has almost mystic signicance. Of course we would prefer 8-bit bytes over
9-bit bytes. 32-bit numbers t this sequence of numbers better than 36-bit numbers.
From Bytes to Words. Once weve settled on 8-bit bytes, the next question is how many bytes make
up a respectable word. Early computers had 64 kilobytes of memory, a number that requires only
16 bits (2 bytes) to represent. We can use a two-byte register to identify any of the bytes in memory.
Many early microprocessors made use of this. The legendary Apple ][ PC had a 6502 processor chip
that worked this way. Growing this to 640K only adds 4 more bits to the address information, a kind
of half-byte compromise that Microsoft made use of to create DOS for the Intel 8088 processor chip.
In the metric measurement system, a kilometer is 1,000 meters. In the world of computers, there is an
elegant power-of-two number that we use instead: 1024. A kilobyte, then is 1024 bytes; a megabyte is
1024*1024 = 1,048,576 bytes; a gigabyte is 1,073,741,824 bytes.
As the amount of memory grew, the size of numbers had to grow so that each location in memory
could have a unique numeric address. Currently, 32-bit numbers are oriented around computers with
2 gigabytes of memory. Newer, larger computers use 64-bit numbers so that they can comfortably
handle more than 2 Gb of memory.
Is the 8-bit byte still relevant? When we look at the worlds alphabets, we discover that our 26-letter
US Latin alphabet isnt really very useful. For most European languages that use the Latin alphabet
well need to add a number of accented characters. For mathematics, well need to add a huge number
of special characters. Once we open the door, we might as well provide for non-Latin alphabets like
Greek, Arabic, Cyrillic, Hebrew and others. Were going to need a lot more than 128 character codes.
And then theres the Chinese problem: there are thousands of individual characters. This is solved by
having Multi-byte Character Sets (MBCS). Currently the Unicode standard uses as many as four bytes
to represent the worlds alphabets.
Since a byte is no longer an individual character, it is not relevant for that purpose. However, it is the
unit in which memory and data are measured, and will be for the foreseeable future.
67
A function is an expression, with the same syntactic role as any other expression, for example 2+3. You
can freely combine functions with other expressions to make more complex expressions. Additionally, the
arguments to a function can also be expressions. Therefore, we can combine functions into more complex
expressions pretty freely. This takes some getting used to, so well look at some examples.
1
2
3
4
5
6
7
8
>>> 3*abs(-18)
54
>>> pow(8*2, 3)*1.5
6144.0
>>> round(66.2/7)
9.0
>>> 8*round(abs(50.25)/4.0, 2)
100.48
1. In the rst example, Python has to compute a product. To do this, it must rst compute the absolute
value of -18. Then it can multiply the absolute value by 3.
3. In the second example, Python has to compute a product of a pow() function and 1.5. To do this,
it must rst compute the product of 8 times 2 so that it can raise it to the 3rd power. This is then
multiplied by 1.5. You can see that rst Python evaluates any expressions that are arguments to
the function, then it evaluates the function. Finally, it evaluates the overall expression in which the
function occurs.
5. In the third example, Python computes the quotient of 66.2 and 7, and then rounds this to the nearest
whole number.
7. Finally, the fourth example does a whopping calculation that involves several steps. Python has to nd
the absolute value of 50.25, divide this by 4, round that answer o to two positions and then multiply
the result by 8. Whew!
68
Note that there is some visual ambiguity between using [ and ] in our Python programming and using [
and ] as markup for the grammar rules. Usually the context makes it clear.
Note that pow(x,0.5) is the square root of x. Also, the function math.sqrt() is the square root of x. The
pow() function is one of the built-in functions, while the square root function is only available in the math
library. Well look at the math library in The math Module Trig and Logs.
In the next example well get the square root of a number, and then square that value. Itll be a two-step
calculation, so we can see each intermediate step.
>>> pow(2, 0.5)
1.4142135623730951
>>> _ ** 2
2.0000000000000004
The rst question you should have is what does that _ mean?
69
The _ is a Python short-cut. During interactive use, Python uses the name _ to mean the result it just
printed. This saves us retyping things over and over. In the case above, the previous result was the value
of pow( 2, 0.5 ). By denition, we can replace a _ with the entire previous expression to see what is really
happening.
>>> pow(2, 0.5) ** 2
2.0000000000000004
Until we start writing scripts, this is a handy thing. When we start writing scripts, we wont be able to use
the _, instead well use something thats a much more clear and precise.
4.2.4 Accuracy?
Lets go back to the previous example: well get the square root of a number, and then square that value.
>>> pow( 3, 0.5 )
1.7320508075688772
>>> _ ** 2
2.9999999999999996
>>> pow( 3, 0.5 ) ** 2
2.9999999999999996
The ... is Pythons hint that the statement is incomplete. Youll need to nish the ()s so that the statement
is complete.
If you provide the optional second parameter, this is the number of decimal places to round to. If the
number of decimal places is a positive number, this is decimal places to the right of the decimal point.
If the number of decimal places is a negative number, this is the number of places to the left of the
decimal point.
>>> round(678.456)
678.0
>>> round(678.456, 2)
678.46000000000004
>>> round(678.456, -1)
680.0
So, rounding o to -1 decimal places means the nearest 10. Rounding o to -2 decimal places is the nearest
100. Pretty handy for doing business reports where we have to round o to the nearest million.
int(x) number
Creates an integer equal to the string or number x. This will chop o all of the digits to the right of
the decimal point in a oating-point number. If a string is given, it must be a valid decimal integer
string.
>>> int('1234')
1234
>>> int(3.14159)
3
long(x) number
Creates a long integer equal to the string or number x. If a string is given, it must be a valid decimal
4.2. Better Arithmetic Through Functions
71
integer. The expression long(2) has the same value as the literal 2L. Examples: long(6.02E23),
long(2).
>>> long(2)**64
18446744073709551616L
>>> long(22.0/7.0)
3L
The rst example shows the range of values possible with 64-bit integers, available on larger computers. This
is a lot more than the paltry two billion available on a 32-bit computer.
Complex Numbers - Math wizards only. Complex is not as simple as the others. A complex number
has two parts, real and imaginary. Conversion to complex typically involves two parameters.
complex(real [, imag ]) number
Creates a complex number with the real part of real; if the second parameter, imag, is given, this is
the imaginary part of the complex number, otherwise the imaginary part is zero.
If this syntax synopsis with the [ and ] is confusing, youll need to see Function Syntax Rules.
Examples:
>>> complex(3,2)
(3+2j)
>>> complex(4)
(4+0j)
Note that the second parameter, with the imaginary part of the number, is optional. This leads to two
dierent ways to evaluate this function. In the example above, we used both variations.
Conversion from a complex number (eectively two-dimensional) to a one-dimensional integer or oat is not
directly possible. Typically, youll use abs() to get the absolute value of the complex number. This is the
geometric distance from the origin to the point in the complex number plane. The math is straight-forward,
but beyond the scope of this introduction to Python.
>>> abs(3+4j)
5.0
Note that the results are surrounded by ' marks. These apostrophes tell us that these arent actually
numbers; theyre strings of digits.
Whats the dierence? Try this and see.
72
11+12
11+'12'
A string of digits may look numeric to you, but Python wont look inside a string to see if it looks like a
number. If it is a string (with " or '), it is not a number, and Python wont attempt to do any math.
Here are the formal denitions of these two functions. These arent very useful now, but well return to them
time and again as we learn more about how Python works.
str(object) string
Creates a string representation of object.
repr(object) string
Creates a string representation of object, usually in Python syntax.
The last example (max( '10', '11', '2' )) shows the alphabetical order of digits problem. Supercially,
this looks like three numbers (10, 11 and 2). But, they are quoted strings, and might as well be words. What
would be result of max( 'ba', 'bb', 'c' ) be? Anything surprising about that? The alphabetic order
rules apply when we compare string values. If we want the numeric order rules, we have to supply numbers
instead of strings.
Here are the formal denitions for these functions.
max(sequence) object
Returns the object with the largest value in sequence.
min(sequence) object
Returns the object with the smallest value in sequence .
73
This statement will tell Python to locate the module named m and provide us with the denitions in that
module. Only the name of the module, m, is added to the local names that we can use. Every name inside
module m must be qualied by the module name. We do this by connecting the module name and the function
name with a .. When we import module math, we get a cosine function that we refer to with module name
dot function name notation: math.cos().
This module qualication has a cost and a benet. The cost is that you have to type the module name over
and over again. The benet is that your Python statements are explicit and harbor no assumptions. There
are some alternatives to this. Well cover it when we explore modules in depth.
Another important thing to remember is that you only need to import a module once to tell Python you will
be using it. By once, we mean once each time you run the Python program. Each time you exit from the
Python program (or turn your computer o, which exits all your programs), everything is forgotten. Next
time you run the Python program, youll need to provide the import statements to add the modules to
Python for your current session.
An Interesting Example. For fun, try this:
import this
The this module is atypical: it doesnt introduce new object classes or function denitions. Instead, well,
you see that it does something instead of extending Python by adding new denitions.
Even though the this module is atypical, you can still see what happens when you use an extra import.
What happens when you try to import this a second time?
74
import math
Since this statement only adds math to the names Python can recognize, youll need to use the math prex
to identify the functions which are inside the math module.
Here are a couple of examples of some trigonometry. Were calculating the cosine of 45, 60 and 90 degrees.
You can check these on your calculator. Or, if youre my age, you can use a slide rule to conrm that these
are correct answers.
>>> import math
>>> math.cos( 45 * math.pi/180 )
0.70710678118654757
>>> math.cos( 60 * math.pi/180 )
0.50000000000000011
>>> math.cos( 90 * math.pi/180 )
6.123233995736766e-17
>>> round( math.cos( 90*math.pi/180 ), 3 )
0.0
75
Here are some more of these common trigonometric functions, including logarithms, anti-logarithms and
square root.
>>> math.exp( math.log(10.0) / 2 )
3.1622776601683795
>>> math.exp( math.log(10.0) / 2 )
3.1622776601683795
>>> math.sqrt( 10.0 )
3.1622776601683795
Returns the Euclidean distance, sqrt( x*x + y*y ) ( x2 + y 2 ), length of the hypotenuse of a right
triangle with height of y and length of x.
math.log(x) number
Returns the natural logarithm (base e) of x (ln x), inverse of exp().
math.log10(x) number
Returns the logarithm (base 10) of x (log10 x), inverse of 10x .
math.pow(x, y) number
Returns x**y (xy ).
math.sqrt(x) number
Returns the square root of x ( x). This version returns an error if you ask for sqrt(-1), even though
Python understands complex and imaginary numbers.
A second module, cmath, includes a version of sqrt() which creates imaginary numbers as needed.
The math module contains the following other functions for dealing with oating-point numbers.
Other Floating-Point Function Denitions.
76
math.ceil(x) number
Returns the next larger whole number. math.ceil(5.1) == 6, math.ceil(-5.1) == -5.0.
math.fabs(x) number
Returns the absolute value of the x as a oating-point number.
math.floor(x) number
Returns the next smaller whole number. math.floor(5.9) == 5, math.floor(-5.9) == -6.0.
Youll see one of two kinds of results. The details vary among the operating systems.
Youll see a result of nan. This is a special code that means Not a Number.
Youll see an exception, like ValueError or OverflowError. An exception will display a bunch of
debugging information that ends with the exception name and a short explanation.
Both results amount to the same thing: the result cannot be computed.
Since this statement only adds random to the names Python can recognize, youll need to use the random
prex on each of the functions in this section.
The randrange() is a particularly exible way to generate a random number in a given range. Heres an
example of some of the alternatives. Since the answers are random, your answers may be dierent from
these example answers. This shows a few of many techniques available to generate random data samples in
particular ranges.
>>>
>>>
5
>>>
6
>>>
6
>>>
13
import random
random.randrange(6)
random.randrange(1,7)
random.randrange(2,37,2)
random.randrange(1,36,2)
1. Were asking for a random number, n, such that 0 n < 6. The number will be between 0 and 5,
inclusive.
77
2. Were asking for a random number, n, such that 1 n < 7. The number will be between 1 and 6,
inclusive.
3. Were asking for a random even number, n, such that 2 n < 37. The range function is dened by
start, stop and step values. When the step is 2, then the values used are 2, 4, 6, . . . , 36.
4. Were asking for a random odd number, n, such that 1 n < 36. The number will be between 1 and
35, inclusive. Here, we start from 1 with a step of 2; the values used are 1, 3, 5, . . . , 35.
The random module contains the following functions for working with simple distributions of random numbers. There are several more sophisticated distributions available for more complex kinds of simulations.
Casino games only require these functions.
78
79
Octal Numbers. Octal numbers use base 8. In Python, we begin octal numbers with a leading zero. Each
octal digits place value is a power of 8. We have the 512s place, the 64s place, the 8s place and the 1s
place. A number like 04211 is 4 512 + 2 64 + 1 8 + 1. This has a value of 2185.
Each group of three bits forms an octal digit. This saves us from writing out all those bits in detail. Instead,
we can summarize them.
Binary: 1-0-0
Octal:
4
0-1-0
2
0-0-1
1
0-0-1
1
Hexadecimal Numbers. Hexadecimal numbers use base 16. In Python, we begin hexadecimal numbers
with a leading 0x. Since we only have 10 digits, and we need 16 digits, well borrow the letters a, b, c, d,
e and f to be the extra digits. Each hexadecimal digits place value is a power of 16. We have the 4096s
place, the 256s place, the 16s place and the 1s place. A number like 0x8a9 is 8 256 + 10 16 + 9, which
has a value of 2217.
Each group of four bits forms a hexadecimal digit. This saves us from writing out all those bits in detail.
Instead, we can summarize them.
Binary:
1-0-0-0
Hexadecimal:
8
1-0-1-0
a
1-0-0-1
9
Bytes. A byte is 8 bits. That means that a byte contains bits with place values of 128, 64, 32, 16, 8, 4, 2,
1. If we set all of these bits to 1, we get a value of 255. A byte has 256 distinct values. Computer memory
is addressed at the individual byte level, thats why you buy memory in units measured in megabytes or
gigabytes.
In addition to small numbers, a single byte can store a single character encoded in ASCII. It takes as many
as four bytes to store characters encoded with Unicode.
An integer has 4 bytes, which is 32 bits. In looking at the special operators, well look at them using integer
values. Python can work with individual bytes, but it does this by unpacking a bytes value and saving it in
a full-sized integer.
Note that the result of the hex() function is technically a string, An ordinary number would be presented
as a decimal value, and couldnt contain the extra hexadecimal digits. Thats why there are apostrophes in
our output.
80
The oct() function converts its argument to an octal (base 8) string. A leading 0 is placed on the string as
a reminder that this is octal not decimal. Here are some examples:
>>> oct(512)
'01000'
>>> oct(509)
'0775'
In base 2, the place values are 32, 16, 8, 4, 2, 1. The string '010101' is evaluated as 1 16 + 1 4 + 1 = 21.
In base 4, the place values are 16, 4 and 1. The string '321' is evaluated as 3 16 + 2 4 + 1 = 57.
Recall from Octal and Hexadecimal Counting by 8s or 16s that we have to press additional symbols into
service to represent base 16 numbers. We use the letters a-f for the digits after 9. The place values are 256,
16, 1; the string 2ac is evaluated as 2 256 + 10 16 + 12 = 684.
While it seems so small, its really important that numbers in another base are written using strings. To
Python, 123 is a decimal number. '123' is a string, and could mean anything. When you say int('123',4),
youre telling Python that the string '123' should be interpreted as base 4 number (which maps to 27 in
base 10 notation.) On the other hand, when you say int('123'), youre telling Python that the string
'123' should be interpreted as a base 10 number, which is 123.
int(object [, base ]) number
Generates an integer from the value object. If object is a string, and base is supplied, object must be
proper number in the given base. If base is omitted, and object is a string, it must be decimal.
81
What makes this murky is the way Python interprets the number has having a sign. The computer hardware
uses a very clever trick to handle signed numbers. First, lets visualize the unsigned, binary number line,
it has 4 billion positions. At the left we have all bits set to zero. In the middle we have a value where the
2-billionth place is 1 and all other values are zero. At the right we have all bits set to one.
82
Heres the same kind of example, combining sequences of bits. This takes a bit of conversion to base 2 to
understand whats going on.
>>> 3 & 5
1
The number 3, in base 2, is 0011 . The number 5 is 0101 . Lets match up the bits from left to right:
0 0 1 1
& 0 1 0 1
------0 0 0 1
83
This is a very low-priority operator, and almost always needs parentheses when used in an expression with
other operators. Here are some examples that show you how & and + combine.
>>> 3&2+3
1
>>> 3&(2+3)
1
>>> (3&2)+3
5
The ^ operator
The binary ^ operator returns a 1-bit if one of the two inputs are 1 but not both. This is sometimes called
the exclusive or operation to distinguish it from the inclusive or . Some people write and/or to emphasize
the inclusive sense of or. They write either-or to emphasize the exclusive sense of or.
>>> 3^5
6
Lets look at this in a little bit of detail. Our rst expression has two or operations, theyre the lowest priority
operators. The rst or operation has 3&0x1f or 0x80. So, Python does the following steps to evaluate this
expression.
84
1. Calculate the and of 3 and 0x1f . This is 3 (try it and see.) You can work it out by hand if you know
that 3 is 0-0-0-1-1 in binary and 0x1f is 1-1-1-1-1.
2. Calculate the or of the previous result (3) and 0x80.
3. Calculate the or of the previous result (0x83) and :0x100. This has the decimal value of 387.
4. Calculate the hex string for the previous result, using the _ short-hand for the previously printed result.
This shows that the hex value is 0x183, what we expected.
The << Operator
The << is the left-shift operator. The left argument is the bit pattern to be shifted, the right argument is
the number of bits. This is mathematically equivalent to multiplying by a power of two, but much, much
faster. Shifting left 3 positions, for example, multiplies the number by 8.
This operator is higher priority than & , ^ and |. Be sure to use parenthesis appropriately.
>>> 0xA << 2
40
0xA is hexadecimal; the bits are 1-0-1-0. This is 10 in decimal. When we shift this two bits to the left, its
like multiplying by 4. We get bits of 1-0-1-0-0-0. This is 40 in decimal.
The >> Operator
The >> is the right-shift operator. The left argument is the bit pattern to be shifted, the right argument
is the number of bits. Python always behaves as though it is running on a 2s complement computer. The
left-most bit is always the sign bit, so sign bits are shifted in. This is mathematically equivalent to dividing
by a power of two, but much, much faster. Shifting right 4 positions, for example, divides the number by 16.
This operator is higher priority than &, ^ and | . Be sure to use parenthesis appropriately.
>>> 80 >> 3
10
The number 80, with bits of 1-0-1-0-0-0-0, shifted right 3 bits, yields bits of 1-0-1-0, which is 10 in
decimal.
Tip: Debugging Special Operators
The most common problems with the bit-ddling operators is confusion about the relative priority of the
operations. For conventional arithmetic operators, ** is the highest priority, * and / are lower priority and
+ and - are the lowest priority. However, among &, ^ and |, << and >> it isnt obvious what the priorities
are or should be.
When in doubt, add parenthesis to force the order you want.
85
Well have to use our bit-ddling operators to unwind this compressed data into a form we can process.
First, well look at getting the red, green and blue values out of a single plain integer.
We can code 256 levels in 8 bits, which is two hexadecimal digits. This gives us a red, green and blue
levels from 0x00 to 0xFF (0 to 255 decimal). We can string the red, green and blue together to make
a larger composite number like 0x0c00a2 for a very bluish purple.
What is 0x0c00a2 & 0xff? Is this the blue value of 0xa2? Does it help to do hex( 0x0c00a2 &
0xff)?
What is (0x0c00a2 & 0xff00) >> 8? hex( (0x0c00a2 & 0xff00) >> 8 )?
What is (0x0c00a2 & 0xff0000) >> 16? hex( (0x0c00a2 & 0xff0000) >> 16 )?
2. Division.
How can we break a number down into dierent digits?
What is 1956 / 1000? 1956 % 1000?
What is 956 / 100? 956 % 100?
What is 56 / 10? 56 % 10?
What happens if we do this procedure with 1956., 956. and 56. instead of 1956, 956 and 56? Can
we use the // operator to make this work out correctly?
86
While most features of Python correspond with common expectations from mathematics and other programming languages, the division operator, /, has a complexity. This is due to the lack of a common expectation
for what division should mean.
Sometimes we expect division to create precise answers, usually the oating-point equivalents of fractions. 355/113 should be 3.1415929203539825.
Other times, we want a rounded-down integer result. 355/113 should be 3.
A basic tenet of Python is that the data determine the result of an operation. Since the two values in
samp:355/113 are integers, the result is an integer.
The are numerous circumstances under which wed prefer an exact answer, however.
Theres no best answer. Sometimes we mean one and other times we mean the other. We need to explicitly
name the division operation we intent.
To see the eect of this assumption, try the following to see what Python does.
355/113
355.0/113
355/113.0
55.0/113.0
The Unexpected Integer. Here are two examples of the classical denition of division. Weve used the
formula for converting 18 Celsius to Fahrenheit. The rst version uses integers, and gets an integer result.
The second uses oating-point numbers, which means the result is oating-point.
>>> 18*9/5+32
64
>>> 18.0*9.0/5.0 + 32.0
64.400000000000006
In the rst example, we got an inaccurate answer from a formula that we are sure is correct. We expected
a correct answer of 64.4, but got 64.
In Python 2, when a formula has a / operator, the inaccuracy will stem from the use of integers where
oating-point numbers were more appropriate. (This can also occur using integers where complex numbers
were implicitly expected.)
If we use oating-point numbers, we get a value of 64.4, which was correct. Try this and see.
18.0*9.0/5.0 + 32.0
Noisy Solutions. The problem we have is reconciling the basic rule of Python (data determines the result)
and the two conicting meanings for division. We have a couple of choices for the solution.
We can solve this by using explicit conversions like float() or int(). However, wed like Python be a simple
and sparse language, without a dense clutter of conversions to cover the rare case of an unexpected type of
data. So this isnt ideal.
This is an over-the-top, worst-case example of using explicit conercion.
>>> float(18)*float(9)/float(5) + float(32)
64.400000000000006
Explicit is Better. As part of being explicit, Python oers us two division operators.
For precise fractional results, well use /.
When we want division to simply compute the quotient, Python has a second division operator, //.
This produces rounded-down integer answers, even if both numbers happen to be oating-point.
87
Old vs. New Division. While usiung Python 2, we need to specify which meaning of / should apply. Do
we mean the original Python 2 denition (data type determines results)? Or do we mean the newer meaning
of / (exact results)?
Python 2 gives us two tools to specify the meaning of the / operator: a statement that can be placed in a
program, as well as a command-line option that can be used when starting the Python program.
Program Statements to Control /. To ease the transition from older to newer language features, the
statement from __future__ import division will changes the denition of the / operator from Python 2
(depends on the arguments) to Python 3 (always produces oating-point).
Note that __future__ module name has two underscores before and after future. Also, note that this must
be the rst executable statement in a script.
Heres the classic division:
>>> 18*9/5+32
64
88
Heres how it looks when we start Python with the -Qold option.
1
2
3
4
5
6
7
8
9
10
1. Here is the python command with the -Qold option. This will set Python to do classical interpretation
of the / operator.
5. When we do old-style / division with integers, we get an integer result.
7. When we do old-style / division with oating-point numbers, we get the precise oating-point result.
9. When we do // division with oating-point numbers, we get the rounded-down result.
Heres how it looks when we start Python with the -Qnew option.
1
2
3
4
5
6
7
8
9
10
1. Here is the python command with the -Qnew option. This will set Python to do the new interpretation
of the / operator.
5. When we do new-style / division with integers, we get the precise oating-point result.
7. When we do new-style / division with oating-point numbers, we get the precise oating-point result.
9. When we do // division with oating-point numbers, we get the rounded-down result.
Why All The Options?. There are two cases to consider here.
If you have an old program, you may need use -Qold to force an old module or program to work the way it
used to.
If you want to be sure youre ready for Python 3, you can use the -Qnew to be sure that you always have
the exact quotient version of / instead of the classical version.
Important: Debugging The -Q Option
If you misspell the -Q option youll see errors like the following. If so, check your spelling carefully.
MacBook-5:~ slott$ python -Qwhat
-Q option should be `-Qold', `-Qwarn', `-Qwarnall', or `-Qnew' only
usage: Python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Try `python -h' for more information.
If you get a message that includes Unknown option: -q, you used a lower-case q instead of an upper-case
Q.
4.5. More Advanced Expression Topics
89
Theres a classic trick that can be used to solve this problem: use scaled numbers. When doing dollars and
cents math, you can scale everything by 100, and do the math in pennies. When you print the nal results,
you can scale the nal result into dollars with pennies to the right of the decimal point. This section will
provide you some pointers on doing this kind of numeric programming.
Later, in Fixed-Point Numbers : Doing High Finance with decimal well look at the decimal module, which
does this in a more sophisticated and exible way.
Scaled Numbers. When we use scaled numbers, it means that the proper value is represented as the
scaled value and a precision factor. For example, if we are doing our work in pennies, the value of $12.99 is
represented as a scaled value of 1299 with a precision of 2 digits. The precision factor can be thought of as
a power of 10. In our case of 12.99, our precision is 2. We can multiply by 10 -precision to convert our scaled
number into a oating-point approximation.
We have three cases to think about when doing xed-point math using scaled integers: addition (and subtraction), multiplication and division. Addition and subtraction dont change the precision. Multiplication
increases the precision of the result and division reduces the precision. So, well need to look at each case
carefully.
Addition and Subtraction. If our two numbers have the same precision, we can simply add or subtract
normally. This is why we suggest doing everything in pennies: the precisions are always 2, which always
match. If our two numbers have dierent precisions, we need to shift the smaller precision number. We do
this by multiplying by an appropriate power of 10.
What is $12.00 + $5.99? Assume we have 12 (the precision is dollars) and 599 (the precision is pennies).
We add them like this: 12*100 + 599. We applied the penny precision factor of 100 to transform dollars
into pennies.
Multiplication. When we multiply two numbers, the result has the sum of the two precisions. If we multiply
two amounts in pennies (2 digits to the right of the decimal point), the result has 4 digits of precision. We
have to be careful when doing this kind of math to determine the rounding rules, and correctly scale the
result.
What is 7.5% of $135.99? Assume we have 13599 (the precision is pennies, 2 digits after the decimal point)
and 75 (the precision is 10th of a percent, three digits to the right of the decimal point). When we multiply,
our result will have precision of 5 digits to the right of the decimal point. The result (1019925) represents
$10.19925. We need to both round and shift this back to have a precision of 2 digits to the right of the
decimal point.
We can both round and scale with an expression like the following. The *.001 resets the scale from 5 digits
of precision to 2 digits of precision.
>>> int(round(13599L*75,-3)*.001)
1020
90
This meas that the labor rate was $108.80 per hour.
The Bigger Picture. Whew! It looks like the special cases of adding (and subtracting), multiplying and
dividing are really complex.
Theres a trick to this, and the trick is to begin with the goal in mind and work backwards to what data
we need to satisfy our goal. For adding and subtracting, our goal precision cant be dierent from our input
precision. When multiplying and dividing, we work backwards: we write down our goal precision, we write
down the precision from our calculation, and we work out rounding and scaling operations to get from our
calculation to our goal.
It turns out that this trick is essential to programming. Well return to it time and again.
91
Python has some sophisticated expression operators. Some of them transcend the simple add-subtractmultiple-divide category, and include operators that apply a function to a list to create a new list, apply a
function to lter a list and apply a function to reduce a list to a single value.
When we evaluate a function like abs(-4), we name the -4 an argument to the function abs(). When looking
at 3+4, we could consider 3 and 4 to be argument values to the +() function. We could hypothetically
imagine rewriting 3+4 to be +(3,4) just to show what it really means.
Imperative Approach. On the other hand, the imperative style is characterized by using a sequential list
of individual statements. Donald Knuth, in his Art of Computer Programming [Knuth73], shows a language
he calls Mix. It is a purely imperative language, and is similar to the hardware languages used by many
computer processor chips.
The imperative style lists a series of commands that the machine will execute. Each command changes the
value of a register in the central processor, or changes the value of a memory location. In the following
example, each line contains the abbreviation for a command and a reference to a memory location or a
literal value. Memory locations are given names to make them easy to read. Literal values are surrounded
by =. The following fragment uses a memory locations named C and F, as well as a processor register.
LDA
MUL
DIV
ADD
STA
C
=9=
=5=
=32=
F
This rst command loads the processors a register with the value at memory location C. The second command
multiplies the register by 9. The third command divides the register by 5. The next command adds 32 to the
register. The nal command stores the contents of the a register into the memory location of the variable F.
Python. Python, like many popular languages, has elements drawn from both applicative and imperative
realms. Well focus initially on expressions and expression evaluation, minimizing the imperative statements.
Well then add various procedural statements to build up to the complete language.
The basic rule is that each statement is executed by rst evaluating all of the expressions in that statement,
then performing the statements task. The evaluation of each expression is done by evaluating the parameters
and applying the functions to the parameters.
This evaluate-apply rule is so important, well repeat here so that you can photocopy this page and make a
counted cross-stitch sampler to hang over your computer. Yes, its that important.
Important: The Evalute-Apply Rule
Each statement is executed by (1) evaluating all of the expressions in that statement, then (2) performing
the statements task.
The evaluation of an expression is done by (1a) evaluating all parameters and (1b) applying the function to
the parameters.
Example: (2+3)*4, evaluates two parameters: 2+3 and 4, and applies the function *. In order to evaluate
2+3, there are two more parameters: 2 and 3, and a function of +.
While it may seem excessive to belabor this point, many programming questions arise from a failure to fully
grasp this concept. Well return to it several times, calling it the evaluate-apply cycle. For each feature of
the language, we need to know what happens when Python does its evaluation. This is what we mean by
the semantics of a function, statement or object.
Another Imperative Example. Heres another example of the imperative style of programming. This
style is characterized by using a sequential list of individual statements. This imperative language is used
internally by Python.
92
In the following example, each line contains an oset, the abbreviation for a command and a reference to a
variable name or a literal value. Variable names are resolved by Pythons namespace rules. The following
fragment uses a variable named c.
2
0
3
6
7
10
11
14
LOAD_FAST
LOAD_CONST
BINARY_MULTIPLY
LOAD_CONST
BINARY_DIVIDE
LOAD_CONST
BINARY_ADD
0 (c)
1 (9)
2 (5)
3 (32)
This rst command (at oset 0) pushes the object associated with variable named c on the top of the
arithmetic processing stack. The second command (at oset 3) loads the constant 9 on the top of the stack.
The third command (at oset 6) multiplies the top two values on the stack. This leaves a new value on the
top of the stack.
The fourth command (at oset 7) pushes a constant 5 onto the stack. The fth command (at oset 10)
performs a divsion operation between the top two values on the stack.
The sixth command (at oset 11) pushes a constant 32 onto the stack. Finally, the sixth command performances an add operation between the top two values on the stack.
Instead, we prefer:
int(22.0/7.0)
A long expression may be broken up with spaces to enhance readability. For example, the following separates
the multiplication part of the expression from the addition part with a few wisely-chosen spaces.
b**2 - 4*a*c
93
94
CHAPTER
FIVE
PROGRAMMING ESSENTIALS
The Input-Process-Output Pattern
Its often helpful to look at programs using a typical pattern called input-process-output . Well work
through this pattern backwards. In order to see output from a script, well need to use the print statement.
Well look at this in Seeing Results : The print Statement.
Once
With
them
here.
we are comfortable with the print statement, we can introduce processing in Turning Python Loose
a Script. When we start making more nished and polished programs, were going to want to make
easy to use. There are a lot of options and shortcuts available to us, and well touch on a few of them
Later, well add even more ease-of-use features.
We In order to do processing, well introduce variables and the assignment statement in Expressions, Constants and Variables. This will allow us to do the basic steps of processing. Well describe some additional
features in Assignment Bonus Features
When we add input in Can We Get Your Input?, well have all three parts to the input-process-output
pattern.
95
Redirection
It is important to note that each shell has ways to redirect the standard output le. Python has
considerable exibility, and so does the shell that runs Python. Too many choices is either confusing
or empowering. Well limit ourselves to looking at the choices Python gives. You can, however, look
at your GNU/Linux shell documentation or Windows Command Prompt documentation to see what
additional choices you have.
When we are interacting with Python at the >>> prompt and we give Python an expression, the result is
printed automatically. This is the way Python responds when interacting with a person. When we run
script, however, we wont be typing each individual statement, and Python wont automatically print the
result of each expression. Instead, we have to tell Python to show us results by writing an statement using
the print() function that shows the response we want.
The print() function isnt automatically available in Python 2. It will be automatically available in Python
3.
To use the print() function, we need to include the following statement.
from __future__ import print_function
This must be the rst statement in a script. It alerts Python that we wont being using the print statement,
and we will be using the print() function.
The basic print() function looks like this:
print(value, ..., sep= , end=n, le=sys.stdout)
The print() function converts each value to a string and writes them to the given le (by default its
standard output).
Important: Statement Syntax Rules
We used an ellipsis (...) to indicate something that can be repeated. Theres no real upper limit on the
number of times something can be repeated.
We used sep= to show two things. First, if this parameter is used, it must be given by name. Second, the
parameter has a default value. That meas it can be safely ignored for now.
Here are some examples of a basic print() function.
from __future__ import print_function
print(22/7, 22./7.)
print(335/113, 335./113.)
print( ((65 - 32) * (5 / 9)) )
Well look at the special purpose sep, end and le arguments separately. For now, its important to note
that they have default values, making them optional.
96
print expression [
expression ] ...
The print statement converts the expressions to strings and writes them to standard output.
Important: Statement Syntax Rules
Well show optional clauses in statements by surrounding them with [ and ]s. We dont actually enter the [
]s, they surround optional clauses to show us what alternative forms of the statement are.
We use a trailing ellipsis (...) to indicate something that can be repeated. Theres no real upper limit on
the number of times something can be repeated.
Also notice that we put a , before the expression. This is your hint that expressions are separated with ,
characters when you have more than one.
In the case of print, the syntax summary shows us there are many dierent ways to use this statement:
We can say print expression with one expression.
We can say print expression, expression with two expressions, separated by ,.
And so on, for any number of expressions, separated by ,s.
While our summary doesnt show this, there are several other forms for the print statement. All of the
extra syntax options and quirks of the print statement are really just fodder for confusion.
Here are some examples of a basic print statement.
print 22/7, 22./7.
print 335/113, 335./113.
print ((65 - 32) * (5 / 9))
Were mostly going to ignore the print statement because the print() function does the same thing and
has no quirks or odd special cases.
Its very important to note that the from __future__ import print_function must be provided rst.
Tip: Debugging the print Statement
One obvious mistake you will make is misspelling print. Youll see NameError: name 'primpt' is not
defined as the error message. Ive spelled it primpt so often, Ive been tempted to rewrite the Python
language to add this as an alternative.
The other common mistake that is less obvious is omitting a comma between the values you are printing.
When you do this, youll see a SyntaxError: invalid syntax message.
97
If the result of a print statement doesnt look right, remember that you can always enter the various expressions directly into IDLEs Python shell to examine the processing one step at a time.
How can I direct output to stderr? Well talk about this in detail in External Data and Files. However,
if you cant wait until then, well provide some hints as to what will come in the future.
from __future__ import print_function
import sys
print( "an error", file=sys.stderr )
How can I use multiple statements to create one line of output? The print() functions default
is to put a newline character at the end, which may not always be desirable.
If we change the end parameter, we can piece together a long line of output from multiple uses of
the print() function. In the following example, the rst statement uses ': ' instead of the newline
character; the print statement will create a partial line.
from __future__ import print_function
print( "335/113", end=': ' )
print( 335.0/113.0 )
If we have very complex expressions, this can make our program easier to read by breaking a complex
message into understandable chunks.
This is not obvious when working with Python at the >>> prompt. When we turn to scripts (in the
next chapter), well see more use for this.
How do I produce binary output? MP3s, MOVs, JPEGs, etc. Well talk about this in detail in
External Data and Files. Theres no quick-and-dirty shortcut for that kind of operation; it requires
98
interacting with the le system. Also, these more sophisticated data formats require more sophisticated
programming.
Within IDLE, you create a le by using the File menu, the New Window item. This will create a new
window into which you can enter your two-line Python program. Check your spelling and spacing carefully.
When you use the File menu, Save item, be sure to read where the le is going to be saved. Youll notice
that IDLE may be starting in C:\Python26, your Macintosh HD, or /home/slott or some other unexpected
directory.
For now, it helps to save this le in your home directory. This could be C:\Documents and Settings\SLott,
or /home/slott.
You cant easily use a word processor for this, since word processors include a lot of formatting markup that
Python cant read. If you want to try and use an oce product to create this kind of le, you have to be
absolutely sure that you save the le as pure text.
There are several ways we can run the Python interpreter and have it evaluate our script le. Since IDLE
is virtually identical on all platforms, well cover this next.
Important: Writing and Saving A Script
One of the biggest benets of using IDLE is that your Python script has various syntax elements highlighted.
In mine, the keywords show up in orange, strings in green, and my expressions are black. If I misspell print,
it doesnt show up in orange, but shows up in black.
The most common problem we see is people saving their le to unexpected locations on their disk. Its
important to save the le to a directory where you can nd it again.
99
One interesting confusion weve seen arises when people forgetting to save the le in the rst place. IDLE
will ask you if you want to save the le when you attempt to run it. Sometimes this message is unexpected
and that can be confusing. Our advice is to save early and save often.
Alternatives to IDLE. If you dont want to use IDLE to create text les, you do have several choices for
nice program editors. These will require you down download and install additional software.
Windows. You have the built-in notepad, or you can purchase any of a large number of programmers
text editors, including TextPad. There are free editors like jEdit, also.
MacOS. You have the built-in textedit application. Be sure to use the Format menu, Make Plain
Text menu item to strip the le down to just text. Or you can purchase any of a large number of
programmers text editors, including BBEdit. There are free editors like jEdit, also.
GNU/Linux. If you are using GNOME, you have gedit. If you want, you can also use vim or
emacs, two very ne sophisticated editors that have been used for decades to write software.
After you create your le outside IDLE, you can open the le with IDLE in order to run it. You use the
File menu, Open... item to open a le you created outside IDLE Its important to take note of where you
save les so that you can nd them and open them again.
This shows us that the Python shell was restarted using our script as input. It also shows us the output
from our two print statements. We ran our rst program.
Important: Debugging Aids in IDLE
If you have syntax errors, youll see a pop-up dialog box named Syntax error with a message like There's
an error in your program: invalid syntax. Youll also notice that some part of your script will be
highlighted in red. This is near the error.
Since IDLE highlights various syntax elements, you can use the color as a hint. In mine, the keywords
show up in orange, strings in green, and my expressions are black. If I misspell print, it doesnt show up in
orange, but shows up in black.
100
If you have semantic errors, youll see these in the shell window in red. For example, I got the following by
messing up my program.
Traceback (most recent call last):
File "E:/Personal/NonProgrammerBook/notes/sample1.py", line 1, in -toplevelprint(65, "F"/2)
TypeError: unsupported operand type(s) for /: 'str' and 'int'
You can see my erroneous print statement: it has "F"/2. And you can also see Pythons complaint. While
the syntax is acceptable, it doesnt mean anything to divide the letter "F" by 2.
We can x our script le, save it, and re-run it. Youll notice that the Run Module menu item has a
short-cut key, usually F5. This edit-save-run cycle is how software gets built.
101
Why are there so many ways to use Python? Or, why cant we just use IDLE? The huge number
of choices is a natural consequence of creating a simple, exible program. Many people use Python
through IDLE and are happy and successful in what they do.
More sophisticated problems, however, often require more complex use of Python. Since the Python
program (python, or python.exe) can be used a variety of ways, we can use Python to build a number
of dierent kinds of solutions to our data processing problems.
In a book like this, we hate to present Python from a single point of view. We prefer to present a
number of choices so that dierent readers can locate one that looks like it will solve their problem.
Do I have to write a script? You dont have to write scripts, you can do everything through interaction
with IDLE. Scripting is not required, but it is generally the goal of programming. An automated
solution should be something that can be double-clicked, or something that is invoked by a web server.
102
__str__
_hidden
Important: Python Name Rules
Names that start with _ or __ have special signicance:
Names that begin with _ are typically private to a module or class. Well return to this notion of
privacy in Dening New Objects and Module Denitions Adding New Concepts.
Names that begin with __ are part of the way the Python interpreter is built. We should never attempt
to create variables with names that begin with __ because it can be lead to confusion between our
programs and Pythons internals.
The Semantics of Naming. How does a variable name get stuck to an object? Primarily, the assignment
statement does this labeling. Well look at the syntax of assignment in the next section. First, we need to
talk about what assignment means, then well look at how to write it.
An assignment statement has two parts: a variable name and an expression.
1. An assignment statement evaluates the expression.
2. An assignment statement then assigns the result to a variable. There are two variations on this.
If the variable name already existed, the name tag is removed from the old value. The old value
is no longer accessible. The name is pinned on the new value.
If the variable name did not already exist, the name is created. Then the name is pinned on the
new value.
We generally dont worry about creating new variables; theres no cost, theyre just names. Create as many
as you need to make your programs purpose crystal-clear.
A Python variable has a scope of visibility. The scope is the set of statements that can reference this variable.
For our rst programs, all of our variables will have a global scope: we can use any variable anywhere we
want. When we look at organizing our programs into separate sections beginning in Organizing Programs
with Function Denitions well see how Pythons separation between local scopes and a global scope can
simplify our programming by localizing a name.
Important: A Script Is A Journey
If we think of a script as a journey from the rst statement to the last, well often mark our progress along
that journey by setting variables. The values that are assigned to our variables amount to a big You Are
Here arrow showing us where we are.
When a variable is created or changed, the overall position along our journey has changed. The variables
start with initial values; the values change as inputs are accepted and our program executes the various
statements. Eventually the state of the variables indicates that we have reached our goal, and our program
can exit.
We emphasize this because some of the more common program design errors have to do with failure to use
variables correctly. We may see a program that doesnt set a variable to indicate when progress has been
made. The mistake is either a missing assignment statement or assignment to the wrong variable. Well look
at ways to debug these kinds of problems in Where Exactly Did We Expect To Be?.
When we look at it this way, our variables are more than just labels slapped on objects. They have a
profound signicance; they reect the meaning of our program.
103
expression
First, evaluate expression, creating some result object. Then, assign the given variable name to that result
object.
Heres a short script that contains some examples of assignment statements.
example3.py
1
2
3
4
5
6
104
This assignment statement pattern is so common that the pattern had to be added to the language as the
augmented assignment statement. These augment the basic assignment with an additional operation. There
are several variations on this combo-pack assignment statement.
The most common application of this pattern is to accumulate a sum. This augmented assignment makes it
obvious what we are doing. For example, look at this common augmented assignment statement.
sum += v
This statement is a handy shorthand that means the same thing as the following:
sum = sum + v
We can use this to replace our rst example above with something slightly simpler.
sum= 0
sum+= 25
sum+= 42
sum+= 37
print( "average", sum/3 )
Heres a larger example that does more substantial calculation at each step. This shows the strength of this
statement.
Create the following le name portfolio.py. In IDLE, you can run this using the Run Module item on
the Run menu.
portfolio.py
1
2
3
4
5
6
The other basic math operations can be used similarly, although the purpose gets obscure for some of the
possible operations. These include -=, *=, /=, %=, &=, ^=, |=, <<= and >>=.
Heres an interesting use of /=. This computes the various digits in a base-10 number from right to left.
>>>
>>>
6
>>>
>>>
5
>>>
>>>
9
>>>
>>>
1
y=1956
y%10
y/=10
y%10
y/=10
y%10
y/=10
y%10
105
Try the same basic sequence of operations using 16 instead of 10. Youll get a sequence of three numbers
before y is equal to zero. Compare that sequence of numbers to hex(1956).
Tip: Debugging the Augmented Assignment Statement
There are two common mistakes in the augmented assignment statement. The rst is to choose an illegal
variable name. If you get a SyntaxError: can't assign to literal or SyntaxError: invalid syntax
the most likely cause is an illegal variable name.
The other mistake is to have an invalid expression on the right side of the assignment operator. If the result
of an assignment statement doesnt look right, remember that you can always enter the various expressions
directly into IDLEs Python shell to examine the processing one step at a time.
2
3
4
5
6
7
8
9
10
106
11
12
13
14
15
16
17
18
19
20
21
22
# Results
print("first roll win", win)
print("first roll lose", lose)
print("first roll establishes a point", point)
Theres a 22.2% chance of winning, and a 11.1% chance of losing. Whats the chance of establishing a point?
One way is to gure that its whats left after winning or loosing. The total of all probabilities always add
to 1. Subtract the odds of winning and the odds of losing and whats left is the odds of setting a point.
More Probability
Heres another way to gure the odds of rolling 4, 5, 6, 8, 9 or 10.
point = 0
point += 2*3/36 # ways to roll 4 or 10
point += 2*4/36 # ways to roll 5 or 9
point += 2*5/36 # ways to roll 6 or 8
print( point )
By the way, you can add the statement print win + lose + point to conrm that these odds all add to
1. This means that we have dened all possible outcomes for the come out roll in Craps.
How To Make A Trace. When we make an execution trace, we start with a clean piece of paper. As we
look at our Python source statements, we write down the variables and their values on the paper. From this,
we can see the state of our calculation evolve.
When we encounter an assignment statement, we look on our paper for the variable. If we nd the variable,
we put a line through the old value and write down the new value. If we dont nd the variable, we add it
to our page with the initial value.
Heres our example from craps.py script through the rst part of the script. The win variable was created
and set to 0, then the value was replaced with 0.16, and then replaced with 0.22. The lose variable was
then created and set to 0. This is what our trace looks like so far.
0.22
win:
0.0
0.16
lose: 0
Heres our example when craps.py script is nished. We changed the variable lose several times. We also
added and changed the variable point.
0.0
win:
0.16
0.22
0.111
0.027
0.083
lose:
0.0
1.0
point:
0.77
0.66
We can use this trace technique to understand what a program means and how it proceeds from its initial
state to its nal state.
107
Youll want to rewrite these exercises using variables to get ready to add input functions.
2. State Change.
Is it true that all programs simply establish a state?
It can argued that a controller for a device (like a toaster or a cruise control) simply maintains a steady
state. The notion of state change as a program moves toward completion doesnt apply because the
software is always on. Is this the case, or does the software controlling a device have internal state
changes?
For example, consider a toaster with a thermostat, a brownness sensor and a single heating element.
What are the inputs? What are the outputs? Are there internal states while the toaster is making
toast?
108
variable , ... =
expression , ...
The ... means that the variable or expression can be repeated any number of times. The , means that we
separate multiple variables and multiple expressions with ,s.
We must have the same number of variables on the left as expressions on the right.
Examples. In all of the examples, well try to show a pattern where two variables are tightly coupled. We
use this when we dont want the assignments to get separated in our program. We want the two assignments
in the same statement to emphasize how tightly coupled the two variables are.
price, shares = 5 + 3./8., 150
amount = price * shares
hours, minutes, seconds = 8, 18, 24
timestamp = (hours*60 + minutes)*60 + seconds
The following script has some more examples of multiple assignment. In this case, were doing some algebra
to compute two values at the same time. The slope, m, and the intercept, b, both depend on the two points,
and can be computed at the same time.
Heres a short example that you can save, named line.py. In IDLE, you can run this using the Run
Module item on the Run menu.
line.py
# Compute line between two points.
from __future__ import print_function, division
x1,y1 = 2,3 # point one
x2,y2 = 6,8 # point two
m,b = float(y1-y2)/(x1-x2), y1-float(y1-y2)/(x1-x2)*x1
print("y=", m, "*x+", b )
This program sets variables x1, y1, x2 and y2. Then we computed m and b from those four variables. Then
we printed the m and b.
The All-At-Once Rule. The basic rule is that Python evaluates the entire right-hand side of the assignment statement. Then it matches values with destinations on the left-hand side. If the lists are dierent
lengths, an exception is raised and the program stops.
Because of the complete evaluation of the right-hand side, the following construct works nicely to swap to
the values of two variables. This is often quite a bit more complicated in other languages.
a,b = 1,4
b,a = a,b
print(a, b)
In Doubles, Triples, Quadruples : The tuple well see even more uses for this feature.
Tip: Debugging Multiple Assignment Statements
109
There are three common mistakes in the augmented assignment statement. The rst is to choose an illegal
variable name. If you get a SyntaxError: can't assign to literal or SyntaxError: invalid syntax
the most likely cause is an illegal variable name.
One other mistake is to have an invalid expression on the right side of the assignment operator. If the result
of an assignment statement doesnt look right, remember that you can always enter the various expressions
directly into IDLEs Python shell to examine the processing one step at a time.
The third mistake is to have a mismatch between the number of variables on the left side of the = and the
number of expressions on the right side.
The rst two inputs are complete statements, so Python provides no response. Our nal calculation of area
didnt produce a response, so we had to provide a simple expression, area, to see our answer.
It turns out that theres a subtle bug in this. It was hidden from us because of the silent execution of
statements.
The solution is based on a built-in feature of Python. When you simply enter an expression, Python always
assigns the result to the implicit result variable, named _. Its as though you typed the following around
each expression.
_ = expression
print(_)
A Longer Conversation. Heres how we use the implicit results variable. We type expressions, and if
the result is helpful we save the result (_) in a new variable.
>>> 335/113.0
2.9646017699115044
>>> 355/113.0
3.1415929203539825
>>> pi=_
>>> pi*2.2**2
15.205309734513278
>>> area=_
Our rst expression had an error. We xed that error and saved the correct implicit result into pi by saying
pi=_. When we nished, we saved the last implicit result into area with area=_.
Th comes in handy when you exploring something rather complex.
Debugging Only. Its important to note that the _ trick only works when were using Python interactively
at the >>> prompt. We cant (and shouldnt) write this in our script les. Our scripts will simply assign
expressions to variables directly.
110
Do not take great pains to line up assignment operators vertically. The following has too much space, and
is hard to read, even though it is fussily aligned. The following is considered as a poor way to write Python.
a
b
aVeryLongVariable
d
=
=
=
=
12
a*math.log(a)
26
13
This is considered poor form because Python takes a lot of its look from natural languages and mathematics.
This kind of horizontal whitespace is hard to follow: it can get dicult to be sure which expression lines
up with which variable. Python programs are meant to be reasonably compact, more like reading a short
narrative paragraph or short mathematical formula than reading a page-sized UML diagram.
Variable names are typically lower_case_with_underscores() or mixedCase(). Variable names typically
begin with lower-case letters.
In addition, the following special forms using leading or trailing underscores are important to recognize:
single_trailing_underscore_: used to avoid conicts with Python keywords. For example: print_ =
42
__double_leading_and_trailing_underscore__: used for special objects or attributes, e.g. __init__,
__dict__ or __le__. These names are reserved; do not use names like these in your programs unless
you specically mean a particular built-in feature of Python.
111
the Terminal (or Command Prompt) where Python was started will manage reading characters from
your keyboard.
Raw Input. By raw, we mean one line of input. The input is what the person typed after handling
backspaces. It isnt every keystroke, its the nished product of typing and backspacing.
The details of how backspacing is handled is actually part of the operating system. Python depends on the
OS to provide the line of input via the standard input le, making sure that the backspace operates as we
expect.
We have to talk about the raw_input() from two distinct points of view.
What you see. You will see a prompt on standard output usually the console and you can respond
to that prompt by typing on standard input. This console will usually be the Python Shell window of
IDLE, or the terminal window, depending on how you are running Python.
What you say in Python. The Python script evaluates a function; the value of that function will
be a string that contains the characters someone typed. Well take a very close look at strings in
Sequences of Characters : str and Unicode. For now, we do a few simple things with strings.
An Example Script. Heres a very small script that uses raw_input() that produces a prompt, reads and
prints the result. This will give you a sense of the two worlds in which this function lives: the world of user
interaction as well as the world of Python function evaluation.
Create the following le, named rawdemo.py. Save it and then run it in IDLE using the Run Module item
in the Run menu.
rawdemo.py
# get the user's answer
from __future__ import print_function
answer= raw_input( "continue?" )
print("You said:", answer)
This program begins by evaluating the raw_input() function. When raw_input() is applied to the parameter of "continue?", it writes the prompt on standard output, and waits for a line of input.
We entered why not?. When we hit enter, we told the operating system that our input line was complete.
The OS hands the completed input line to Python. Python then returns this string as the value of the
raw_input() function.
Our program saved the value from raw_input() int the variable answer. The second statement printed that
variable.
Denition. Heres a formal denition for the raw_input() function.
raw_input([prompt ]) string
Read a string from standard input. If a prompt is given, it is printed before reading. If the user hits
end-of-le (Ctrl-D in GNU/Linux or MacOS; Ctrl-Z in Windows), an exception is raised.
Theres a second input function, named input(). It will be removed from Python 3 because its worse than
useless. Its confusing and a potential security nightmare.
112
Making Raw Input Useful. If we want a numeric value, we must convert the resulting string to a number.
In the following example, well use the int() and float() functions to convert the strings we got from the
raw_input() function into numbers that we can use for calculation.
Well use the raw_input() and int() functions to get a number of shares. The resulting number is assigned
the name shares. Then the program uses the raw_input() and float() functions to get the price.
Create the following le, named stock.py. Save it and then run it in IDLE using the Run Module item
in the Run menu.
stock.py
# Compute the value of a block of stock
from __future__ import print_function
shares = int( raw_input("shares: ") )
price = float( raw_input("dollars: ") )
print("value", shares * price)
Exceptional Input. Exceptions, in Python, often mean exceptionally bad. The raw_input() mechanism
has some limitations. If the string returned by raw_input() is not suitable for use by int() function, an
exception is raised and the program stops running.
Heres what it looks like when we ran stock.py and provided inappropriate input values.
Well look at ways to handle this in The Unexpected : The try and except statements.
113
Professional quality software doesnt make much use of these functions. Typically, interactive programs use
a complete graphic user interface (GUI), often written with the Tkinter module or the pyGTK module. Both
of these are beyond the scope of this book: they arent newbie-friendly modules.
114
C = input('Celsius: ')
F = 32+C*float(9/5)
print("celsius", C, "fahrenheit", F)
1. Stock Value.
Input the number of shares, dollar price and number of 8ths. From these three inputs, compute the
total dollar value of the block of stock.
2. Convert from C to F.
Write a short program that will input C and output F. A second program will input F and output
C.
3. Periodic Payment.
Input the principal, annual percentage rate and number of payments. Compute the monthly payment.
Be sure to divide rate by 12 and multiple payments by 12.
4. Surface Air Consumption Rate.
Write a short program will input the starting pressure, nal pressure, time and maximum depth.
Compute and print the SACR.
A second program will input a SACR, starting pressure, nal pressure and depth. It will print the
time at that depth, and the time at 10 feet more depth.
5. Wind Chill.
Input a temperature and a wind speed. Output the wind chill.
6. Force from a Sail.
Input the height of the sail and the length. The surface area is 21 h l. For a wind speed of 25 MPH,
compute the force on the sail. Small boat sails are 25-35 feet high and 6-10 feet long.
target [
... ]
A target is a name of a Python object: a variable, function, module or other object. The variable is removed.
Generally, this also means the target object is removed from memory.
The del statement works by unbinding the name, removing it from the set of names known to the Python
interpreter. If this variable was the only reference to an object, the object will be removed from memory
also. If, on the other hand, other variables still refer to this object, the object wont be deleted.
115
CHAPTER
SIX
SOME SELF-CONTROL
Making Choices, Doing It All
This section represents a signicant milestone. Up until this part, we have presented Python as a souped-up
desk calculator. This part shows the essential elements of building automated data processing.
Well start with truth and logic in Truth and Logic : Boolean Data and Operators. This is an extension
to the expressions and numeric types we started out with. Well add another data type, boolean , and a
number of operators, including comparisons and basic logic of and, or and not. With this foundation in
logic, we can introduce comparisons in Making Decisions : The Comparison Operators.
The basic tools of logic and comparison are the essential ingredient to looking at conditional processing
in Processing Only When Necessary : The if Statement. Conditional processing is controlled by the if
statement, and reects processing that only makes sense when a condition is true.
The other side of this is iterative processing, which well cover in While We Have More To Do : The for
Statement. Iterative processing can depends on a condition similar to an if statement. Well look at this
in While We Have More To Do : The while Statement.
Well cover a number of additional topics Becoming More Controlling. This includes the break , continue
and assert statements to provide a ner level of control over the processing. Additionally, well look at
many of the traps and pitfalls associated with iterative processing.
In Comments and Scripts well return to scripting to make our scripts t more smoothly with our operating
system. Well look at a complete family tree of processing alternatives using the command line as well as the
GUI. There are a number of operating-system specic variations on this theme, and we cant easily cover
every alternative.
117
6.1.1 Truth
The domain of arithmetic involves a large number of values: there are billions of integer values and an
almost unlimited range of long integer values. There are also a wide variety of operations, including addition,
subtraction, multiplication, division, remainder and others, too numerous to mention. The domain of logic
involves two values: False and True, and a few operations like and, or and not.
This world of logic bridges language, philosophy and mathematics. Because we deal with logic informally all
the time, it can seem needless to dene a formal algebra of logic. Computers, however, are formally dened
by the laws of logic, so we cant really escape these denitions. If you dont have any experience with formal
logic, theres no call for panic: with only two values (true and false), how complex can the subject be?
Mostly, we have to be careful to follow these formal denitions, and set aside the murky English-language
idioms that masquerade as logic.
We also have to be careful to avoid confusing logic and rhetoric. A good public speaker often uses rhetorical
techniques to make their point. In some cases, the rhetoric will involve logic, but other times, it will
specically avoid logic. One example is to attack the speaker personally, rather than attack the logic behind
the point theyre trying to make. Political debates include many examples of rhetorical techniques that have
a certain kind of logic, but arent grounded in the kind of formal mathematical logic that were going to
present here.
Truth. Python has a number of representations for truth and falsity. While were mostly interested in the
basic Python literal of False and True, there are several alternatives.
False.
Also 0, the complex number 0+0j, the special value None, zero-length strings "", zero-length lists [],
zero-length tuples (), and empty mappings {} are all treated as False. Well return to these list, tuple
and map data structures in later chapters. For now, we only need to know that a structure that is
empty of meaningful content is eectively False.
True.
Anything else that is not equivalent to False. This means that any non-zero number, or any string
with a length of one or more characters are equivalent to True.
What about maybes and unknowns? Youll need a good book on more advanced logic systems if you
want to write programs that cope with shades of meaning other than simple true and false. This kind of
fuzzy logic isnt built in to Python. You could write your own extension module to do this.
The bool Function. Python provides a factory function to provide the truth value of any of these objects.
In eect, this collapses any of the various forms of truth down to one of the two explicit objects: True or
False.
bool(object) boolean
Returns True when the argument object is equivalent to true, False otherwise.
We can see how this works with the following examples.
>>> bool(1)
True
>>> bool(0)
False
>>> bool( "a string" )
True
118
Historical Note
Historically, Python didnt have the boolean literals True and False. You may nd older open-source
programs that dene variables to have values that mean True and False. You might see a cryptic
dance that looks something like the following:
try:
True, False
except NameError:
True, False = (1==1), (1==0)
This little trick is no longer necessary. We present it here so that you wont be surprised by seeing it
in an open source package youre reading.
6.1.2 Logic
Python provides three basic logic operators that work on the domain of True and False values: not, and
and or. This domain and the applicable operators forms a complete algebraic system, sometimes called a
Boolean algebra, after the mathematician George Boole.
In Python parlance, the data values of True and False, plus the operators not, and and or dene a data
type. In Simple Arithmetic : Numbers and Operators we saw a number of numeric data types, and well look
at yet more data types as we learn more about Python.
Truth Tables. The boolean data type has only two values, which means that we can dene the boolean
operators by enumerating all of the possible results in a table. Each row of the table has a unique combination
of True and False values, plus the result of applying the logic operator to those values. There are only four
combinations, so this is a pretty tidy way to dene the operators.
We wouldnt want to try this for integer multiplication, since we have almost four billion integer values
(including both negative and positive values), which would lead to a table that enumerates all 18 quintillion
combinations.
Heres an example of a truth table for some hypothetical operator well call cake. Rather than show and, or
or not specically, well use a made-up operator so we can show how a truth table is built.
This table shows all possible results for x cake y. It shows all four combinations of inputs and the result of
applying our logic operation to those values.
x
True
True
False
False
y
True
False
True
False
x cake y
True cake True = False
True cake False = True
False cake True = True
False cake False = False
The not Operator. The following little program creates a truth table that shows the value of not x for
both vales of x. It may seem silly to take such care over the obvious denition that not True is False.
However, we can use this technique to help us visualize more complex logical operations.
from __future__ import print_function
print("x", "not x")
print(True, not True)
print(False, not False)
x
True
False
not x
False
True
119
The and Operator. This next little program creates a truth table that shows the value of x and y for all
four combination of True and False. You can see from this table that x and y is only True if both of the
terms are True. This corresponds precisely to the English meaning of and.
from __future__ import print_function
print("x", "y", "x and y")
print(True, True, True and True)
print(True, False, True and False)
print(False, True, False and True)
print(False, False, False and False)
x
True
True
False
False
y
True
False
True
False
x and y
True
False
False
False
The or Operator. The following table shows the evaluation of x or y for all four combination of True and
False. You can see from this table that x or y is True if one or both of the terms are True. In English,
we often emphasize the inclusiveness of this by writing and/or . We do this to distinguish it from the
English-language exclusive or, (sometimes written either/or) which means one or the other but not
both. Pythons x or y is the inclusive sense of or.
x
True
True
False
False
y
True
False
True
False
x or y
True
True
True
False
An important note is that and is a higher priority operator than or, analogous to the way multiplication is
higher priority than addition. This means that when Python evaluates expressions like a or b and c, the
and operation is evaluated rst, followed by the or operation. This is equivalent to a or (b and c).
Tip: Debugging Logic Operators
The most common problem people have with the logic operators is to mistake the priority rules. The lowest
priority operator is or. and is higher priority and not is the highest priority. If there is any confusion, extra
parentheses will help.
Other Operators. There are theoretically more logic operators. However, we can dene all of other
the other logic operations using just not, and and or. Other logic operations include things like if-then,
if-and-only-if. For example, if a then b can be understand as (a and b or not a).
One of the more important additional logic operations is or in an exclusive sense, sometimes called oneor-the-other-but-not-both or exclusive or, abbreviated xor. We can understand a xor b as ((a or b) and
not (a and b)). The parenthesis are required to create the correct answer.
How can we prove this? Write a short program like the following:
from __future__ import print_function
a, b = True, True
print(a, b, ((a or b) and not (a and b)))
a, b = True, False
print(a, b, ((a or b) and not (a and b)))
Youll have to repeat this for False, True and False, False combinations, also.
The claim that we can dene all logic operations using only not, and and or is a fairly subtle piece of
mathematics. Well just lift up a single observation as a hint to how this is possibly true. We note that
120
given two values and an operation, there are only four combinations of values in the truth table. There are
only 16 possible distinct tables built from four boolean values. The logic puzzle of creating each of the 16
results using only not, and and or isnt terribly hard. For real fun, you can try constructing all 16 results
using only the not-and operator, sometimes called nand.
121
5. Implies.
The word implies has a formal logic denition. We say a implies b as a short form of if a, then b.
We might say rain implies a wet lawn, or if it rains, then the lawn gets wet. In Python, we might
want to write a implies b if Python had a logic operator named implies. When we look at the
formal meaning of our hypothetical x implies y, we want it to be true when x and y are true. When
x is false, the truth or falsity of y doesnt really matter. We can say that implication is true when both
x and y are true or x is false.
Does (x and y) or (not x) create the correct truth table for implies?
There are two boolean values, 256 unique byte values, 4 billion unique integers. How many unique oatingThe almost unlimited at the beginning of the chapter was unsatisfyingly vague.
The domain of values for oating-point numbers is technically nite. The domain depends, to a
small extent, on your computer. Well assume 64-bit oating point numbers. These have 264 distinct
values, which is 18.4 quintillion. These values, however, are spread over a range that includes 21023
(approximately 10308 ) as the number closest to zero, and 21024 (approximately 10308 ) as the number
furthest from zero.
Some computers have 80-bit oating-point numbers. The ranges in this case would obviously be
somewhat larger.
122
The = and == Problem. Heres a common mistake. Weve used a single = (assignment), when we meant
to use == (comparison). We get a syntax error because we have a literal 99 on the left side of the = statement.
>>> 99 = 2
File "<stdin>", line 1
SyntaxError: can't assign to literal
An Unexpected Coercion. Heres a strange thing that can happen because of the way Python converts
between numeric type data and boolean type data. First, look at the example, then try to gure out what
happened.
123
Because of the (), the 10 >= 9 is evaluated rst. What is the result of that comparison?
How can it make sense to compute a sum of a boolean value (True or False) and a number? It doesnt,
really, but Python tries anyway. It must have converted the boolean result of 10 >= 9 to a number. Try
the following to see what has happened.
>>> (10 >= 9) + 2
3
>>> 10 >= 9
True
>>> int( 10 >= 9 )
1
>>> int( 10 >= 9 ) + 2
3
Writing 13 <= somethingComplex <= 24 instead of 13 <= somethingComplex and somethingComplex <=
24 is particularly useful when somethingComplex is actually some complex expression that wed rather not
repeat.
Proper Floating-Point Comparison. Exact equality between oating-point numbers is a dangerous
concept. During a lengthy computation, round-o errors and conversion errors in oating-point numbers
will accrue a tiny error term. In Better Arithmetic Through Functions, we saw answers that were o in the
124
15th decimal place. These answers are close enough to be equal for all practical purposes, but one or more
of the 64 bits may not be identical.
The following technique is the appropriate way to do the comparison between oating-point numbers a and
b.
abs(a-b)/a<0.0001
Rather than ask if the two oating-point values are the same, we ask if theyre close enough to be considered
the same. For example, run the following tiny program.
oatequal.py
# Are two floating-point values really completely equal?
from __future__ import print_function
a,b = 1/3.0, .1/.3
print(a, b, a==b)
diff= abs(a-b)/a
print(diff, diff < 0.0001)
The two values appear the same when printed. Yet, on most platforms, the == test returns False. They
are not identical values. They dier at the 16th digit past the decimal point.
This is a consequence of representing real numbers with only a nite amount of binary precision. Certain
repeating decimals get truncated, and these truncation errors accumulate in our calculations.
There are ways to avoid this problem; one part of this avoidance is to do the algebra necessary to postpone
doing division operations. Division introduces the largest number erroneous bits onto the trailing edge of
our numbers. The most important part of avoiding the problem is never to compare oating-point numbers
for exact equality.
The variable a refers to 123456789. The variable b also refers to the same object.
When we evaluate the following, the results arent surprising.
>>> a=123456789
>>> b=a
>>> a is b
True
125
>>> a == b
True
The is operator tells us that variable a and variable b are two dierent labels attached to the same underlying
object.
The == operator tells us that variable a and variable b refer to objects which have the same numeric value.
In this case, since a is b is True, its not surprising that a == b.
Not the Same Thing. In most cases, however, well have situations like the following. Well create two
distinct objects that have the same numeric value.
>>> a=123456789
>>> c=a*1
>>> c is a
False
>>> c == a
True
>>> c is not a
True
In this example. weve evaluated a simple operator (*), which created a new object. We know its a new
object because c is a is False (also, c is not a is True). However, this new object has the same numeric
value as a.
Common Use. The most common use for is and is not is when comparing specic objects, not generic
numeric values. We do this mostly with the object None.
>>> variable = None
>>> variable
>>> variable is None
True
>>> anotherVariable = 355/113.0
>>> anotherVariable is None
False
>>> anotherVariable
3.1415929203539825
First, were only bending the law. The essential principle is still followed, were just extending the rule
a little: all of the logically necessary parts of an expression are evaluated rst.
The alternatives to the short-circuit sense of and and or are either much more complex logic operators
(and, or, cand and cor) or decomposing relatively simple logic into a complex sequence of statements,
using a if statement instead of a short-circuit logic operator.
The objective of software is to capture knowledge of processing in a clear and formal language. Fussy
consistency in this case doesnt help achieve clarity.
This is a three-part operator that has a condition (in the middle) and two values. If the condition is True,
then the value of the entire expression is the trueValue. If the condition is False, then the value of the entire
expression is the falseValue.
Note: Terminology
Sometimes youll hear this called the ternary operator. This is a confusing name, and shouldnt be used.
Its a ternary operator: there are three arguments. All other operators are unary (-a) or binary (a+b).
This happens to be the only ternary operator. Theres no reason for calling it the ternary operator because
others could be fashioned.
Heres an example. Lets say were monitoring the temperature of a walk-in cooler.
status = "in range" if -5 <= freezer <= 0 else "problem"
If the temperator is between -5 and 0, the status is in range. If the temperature is outside the range, the
label will be problem.
This is strictly a two-way true-false comparison. If you need more than two simple choices, youll need
something more sophisticated like the complete if statement, Processing Only When Necessary : The if
Statement.
Rule-Bender. This violates the basic rule we dened in The Evalute-Apply Rule. The general rule applies
almost everywhere else: every expression is fully evaluated. This conditional expression bends this rule,
however, to limit evaluation to the logically necessary sub-expressions rather than every single sub-expression.
Python always evaluates the condition. It then evaluates one of the two other expressions. The remaining
expression is not evaluated.
Heres another example.
average = float(sum)/count if count != 0 else 0.0
127
This will show you that when the left-side value is equivalent to False, that is what Python returns for and.
The other value isnt even evaluated.
Try this.
>>> import math
>>> False and math.sqrt(-1)
>>> False and 22/0
What happens?
The and operator doesnt evaluate the right-hand parameter if the value on the left-hand side is False.
The or operator, similarly, does not evaluate the right-hand parameter if the left-hand side is equivalent to
True.
Rule-Bender. This violates the basic rule we dened in The Evalute-Apply Rule. The general rule applies
almost everywhere else: every expression is fully evaluated. The and and or operators bend this rule, however,
to limit evaluation to the logically necessary sub-expressions rather than every single sub-expression.
This short-circuit can be useful. This is an example of the practicality beats purity principle that makes
Python so cool.
Simplications. Lets look at the informal logic of English for a moment. When sailing, we might say if
the wind is over 15 knots, we reef the main sail. Reeng, for non-sailors, is a technique for reducing the
area of the sail; we do this for a variety of reasons, for example, so that the boat doesnt lean over (heel)
too far in a high wind.
One important consequence of the short-cut rule is that the Python logic operators are very handy for
creating a simple, clear statement of some sophisticated processing. One of the most notable examples of
this are expressions like the following which summarize our sailing rule very nicely.
full = 72 # square feet
reefed = 65 # square feet
sailArea= reefed if windSpeed > 15 else full
One More Condition. What if we are motoring, not sailing? In English, we can say something like when
sailing and the wind is over 15 knots, we reef the main sail. The implication is that when we are motoring,
128
the wind-speed comparison is irrelevant. After all, it is a bit silly to also check the wind speed when we
dont even have the sails up.
If we dont think carefully, we would wind up with the following and incorrect set of conditions.
Engine
Sailing
Motoring
Conditions
Wind Speed <= 15 kn
Wind Speed > 15 kn
Wind Speed <= 15 kn
Wind Speed > 15 kn
Conguration
full
reefed
full
reefed
The table above is silly because motoring makes the wind speed and sail positions irrelevant. Why are we
checking them needlessly?
Clarication. This table has what we really meant. This clearly states that when were motoring, we dont
need to check the wind-speed.
Engine
Sailing
Motoring
Conditions
Wind Speed <= 15 kn
Wind Speed > 15 kn
Doesnt matter
Conguration
full
reefed
None
This reects the decision tree in our table. The overall condition is the if engine == "Sailing" check.
If were sailing, then theres a second check based on the wind speed.
We can test this with little scripts like the following:
>>>
>>>
72
>>>
>>>
>>>
When sailing in light air (10 knots), we should have the full sail, all 72 square feet deployed.
When motoring in a sti breeze (25 knots), we should have no sail.
Develop a similar technique using or instead of and. This will require some reversals of the logic in
the above example. We can interpret it as doing a comparison or a calculation. If the rst clause (the
comparison) is false, we want to continue on to the next clause (the calculation).
This is an application of De Morgans Laws.
129
2. Hardways.
Assume d1 and d2 have the numbers on 2 dice. A hardways proposition is 4, 6, 8, or 10 with both
dice having the same value. Its the hard way to get the number. A hard 4, for instance is d1+d2 ==
4 and d1 == d2. An easy 4 is d1+d2 == 4 and d1 != d2.
You win a hardways bet if you get the number the hard way. You lose if you get the number the easy
way or you get a seven. Write the winning and losing condition for one of the four hard ways bets.
130
The word if and the : are essential syntax. The suite is an indented block of one or more statements. Any
statement is allowed in the block, including indented if statements. You can use either tabs or spaces for
indentation. The usual style is four spaces, and we often set BBEdit or TextPad to treat the tab key on our
keyboard as four spaces.
This is the rst compound statement weve seen. A compound statement statement makes use of the essential
syntax rules we looked at in Long-Winded Statements. It also uses two additional syntax rules, that well
look at next.
Semantics. The if statement evaluates the condition expression rst. When the result is True, the suite of
statements is executed. Otherwise the suite is skipped. Lets look at some examples.
131
Here we have a typically complex expression. Heres how the if statement works.
1. The expression d1+d2 == 7 or d1+d2 == 11 is evaluated.
The or operator evaluates the left side of the or operation rst; if this is False, it will then evaluate
the right side. If the left side is True, the result is True.
2. The expression d1+d2 == 7 is evaluated.
3. If this value is True, ( d1 + d2 really is 7), the entire or expression is True and evaluation of the
expression is complete.
4. If the left side is False, then the right side is evaluated. The value of the right side (d1+d2 == 11) is
the value for the entire or operation. This could be True or False.
5. If the value of the expression is True, the suite is executed, which means that a message is printed.
If the expression is True, the suite is skipped.
Syntax Help from IDLE. The suite of statements inside the if statement is set apart from other statements
by its indentation. This means you have to indent the statements in the suite consistently. Any change to
the indentation is, in eect, another suite or the end of this suite.
The good news is the IDLE knows this rule and helps us by automatically indenting when we end a line
with a :. It will automatically indent until we enter a blank line. Heres how it looks in IDLE. In order to
show you precisely whats going on, were going to replace normally invisible spaces with _ characters.
>>>_if_1+2_==_3:
...____print("good")
...
good
1. You start to enter the if statement. When you type the letter f, the color of if changes to orange, as
a hint that IDLE recognizes a Python statement. You hit enter at the end of the rst line of the if
statement.
2. IDLE indents for you. You type the rst statement of the suite of statements.
3. IDLE indents for you again. You dont have any more statements, so you hit enter. The statement
is complete, so IDLE executes the statement.
4. This is the output. Since 1+2 does exactly equal 3, the suite of statements is executed.
Important: Syntax Rule Eight
Compound statements, including if, while, for, have an indented suite of statements. You have a number
of choices for indentation; you can use tab characters or spaces. While there is a lot of exibility, the most
important thing is to be consistent.
Well show an example with spaces shown via _.
a=0
if_a==0:
____print("a_is_zero")
else:
____print("a_is_not_zero")
Heres an example with spaces shown via _ and tabs shown with :
if_a%2==0:
print("a_is_even")
else:
print("a_is_odd")
132
While the tab character is allowed, spaces are preferred. Many experience Python programmers set their
text editors to replace tab characters with four spaces.
Semantics. Python treats the if and elif sequence of statements as a single, big statement. Python
evaluates the if expression rst; if it is True, Python executes the if suite and the statement is done; the
elif suites are all ignored. If the initial if expression is False, Python looks at each elif statement in order.
If an elif expression is True, Python executes the associated suite, and the statement is done; the remaining
elif suites are ignored. If none of the elif suites are true, then nothing else happens
Complete Come Out Roll. Here is a somewhat more complete rule for the come out roll in a game of
Craps:
d1, d2= random.randint(1,6), random.randint(1,6)
result= None
if d1+d2 == 7 or d1+d2 == 11:
result= "winner"
elif d1+d2 == 2 or d1+d2 == 3 or d1+d2 == 12:
result= "loser"
print(result)
133
If neither condition is true, the if statement has no eect. The script prints the result, which will be None.
The Roulette Example. Heres the even-odd rule from Roulette. We have one subtlety in Roulette that
we have to look at: the problem of zero and double zero. What well do is generate random numbers between
-1 and 36. Well treat the -1 as if it was 00, which is like 0, neither even nor odd.
from __future__ import print_function
import random
spin= random.randint(-1,36)
result= None
if spin == 0 or spin == -1:
result= "neither"
elif spin % 2 == 0:
result= "even"
elif spin % 2 == 1:
result= "odd"
print(spin, result)
This clause is always last and, eectively, always True. When the if expression and all of the elif expressions
are false, Python will execute any else suite that we provide.
Come Out Roll Script. Heres the complete come-out roll rule. In this nal example, weve added the
necessary import and assignment statements to make a complete little script.
comeoutroll.py
from __future__ import print_function
import random
d1,d2= random.randrange(1,7), random.randrange(1,7)
point= None
result= None
if d1+d2 == 7 or d1+d2 == 11:
result= "winner"
134
Here, we used the else suite to handle all of the other possible rolls. There are six dierent values (4, 5, 6,
8, 9, or 10), a tedious typing exercise if you write it our using or. We summarize this complex condition
with the else clause.
Tip: Debugging the if statement.
If you are typing an if statement, and you get a SyntaxError: invalid syntax, you omitted the :.
A common problem with if statements is an improper condition. You can put any expression in the if or elif
statement. If the expression doesnt have a boolean value, Python will use the bool() function to determine
if the expression amounts to True or False. Its far better to have a clear boolean expression rather than
trust the rules used by the bool() function.
One of the more subtle problems with the if statement is being absolutely sure of the implicit condition that
controls the else clause. By relying on an implicit condition, it is easy to overlook gaps in your logic.
Consider the following complete if statement that checks for a winner on a eld bet. A eld bet wins on 2,
3, 4, 9, 10, 11 or 12. The payout odds are dierent on 2 and 12.
outcome= 0
if d1+d2 == 2 or
outcome= 2
print("field
elif d1+d2==4 or
outcome= 1
print("field
else:
outcome= -1
print("field
d1+d2 == 12:
pays 2:1")
d1+d2==9 or d1+d2==10 or d1+d2==11:
pays even money")
loses")
Heres the subtle bug in this example. We test for 2 and 12 in the rst clause; we test for 4, 9, 10 and 11
in the second. Its not obvious that a roll of 3 is missing from the eld pays even money condition. This
fragment incorrectly treats 3, 5, 6, 7 and 8 alike in the else:.
While the else: clause is used commonly as a catch-all, a more proper use for else: is to raise an exception
because a condition was found that did not match by any of the if or elif clauses.
135
Unfortunately, the Python languages doesnt allow a suite of statements to be empty. We dont want to have
to rearrange our programs logic to suit a limitation of the language. We want to express our processing
clearly and precisely. Enter the pass statement.
The syntax is trivial.
pass
The pass statement does nothing. It is essentially a syntax place-holder that allows us to have a do nothing
suite embedded in an if statement.
Heres how it looks.
if a > 12:
pass
elif a == 0:
print("zero")
else:
print(a, "between 1 and 12")
If the value of a is greater than 12, the if statements expression is true, and Python executes the rst suite
of statements. That suite is simply pass, so nothing happens.
If the value of a is zero, the rst elif statements expression is true, and Python executes the second suite
of statements. That suite is a print() function, and we see a zero printed.
If none of the previous expressions are true, Python falls back to the else statement, in which case, we would
see a message about a being between 1 and 12.
136
4. Hardways Roll.
Accept d1 and d2 as input. First, check to see that they are in the proper range for dice. If not, print
a message.
Otherwise, check for a hard ways bet pay out. Hard 4 and 10 pays 7:1; Hard 6 and 8 pay 9:1, easy 4,
6, 8 or 10, or any 7 loses. Everything else, the bet still stands.
5. Partial Evaluation.
This partial evaluation of the and and or operators appears to violate the evaluate-apply principle
espoused in Execution Two Points of View. Instead of evaluating all parameters, these operators
seem to evaluate only the left-hand parameter before they are applied. Is this special case a problem?
Can these operators be removed from the language, and replaced with the simple if-statement? What
are the consequences of removing the short-circuit logic operators?
137
Initial Condition. Any initial values of F and C are valid for the start of this program.
When we reverse this list of goals, we have the algorithm for computing a mapping between the Fahrenheit
temperature and the Celsius temperature.
We often call iterative or repetitive processing a loop because the program statements are executed in a
kind of loop. Both the for and while statements provide a condition that controls how many times the loop
is executed. The condition in the for statement is trivial, but the pattern is so common that it has a large
number of uses. The condition in the while statement is completely open-ended, therefore it requires a little
more care when designing the statement.
Iterative processing relies on all the elements of sequential and conditional processing that weve already seen.
Iterative programming is the backbone of computing. First well look at three common kinds of iteration.
Then well see how to write those kinds of iteration in Python.
Heres the result of a small program that produces a mapping from Swedish Krona (SEK) to US Dollars
(USD); its a currency exchange table. A Krona is worth about $0.125 right now. We used a Python for
all loop to iterate through all values from 5 to 50 in steps of 5.
5 0.625
10 1.25
15 1.875
20 2.5
25 3.125
30 3.75
35 4.375
40 5.0
45 5.625
50 6.25
The result of a mapping is an output set of values; the size of the output set matches the size of the input
set. In our example above, we have has many Krona values as Dollar values.
Reducing All Values To One Value. Another common kind of iteration is a reduction where all the
values in a set are reduced to a single resulting value. When we add up or average a column of numbers,
were doing a reduction. As we look at our two representative problems (see Two Minimally-Geeky Problems
: Examples of Things Best Done by Customized Software), we see that we will be simulating casino games
and computing averages of the results of a number of simulation runs.
138
The for statement is ideal for performing reductions. The typical pattern for a reduction uses a initializations, a for all, and one or more statements to update the reduction.
Initialize the Reductions
total= 0
count= 0
For i is a values in some set:
Update the Reduction
total += calculation based on i
count += 1
The result of a reduction is a single number created from the input set of values. Common examples are
the sum, average, minimum or maximum. It could also be a more sophisticated reduction like the statistical
median or mode.
Filtering All Values To Find a Subset. The third common kind of iteration is a lter where the iteration
picks a subset of the values from all values in a set. For instance, we may want a lter that keeps only even
numbers, or only red numbers in Roulette.
In this case, were introducing a condition, which makes a lter more complex than the map or reduce. A
lter combines iteration and conditional processing.
There are two senses of ltering:
Find all values that match the condition.
Find some value that matches the condition. This is a slightly more complex case, and well return to
it several times.
These are sometimes lumped under the category of search. Search is so important, that several computer
science books are on focused on just this subject.
When we look closely at the rules for Craps we see that a game is a kind of lter. Once the game has
established a point, the rest of the game is a kind of lter applied to a sequence of dice roles that ignores
dice roles except for 7 and the point number. We can imagine adding lter conditions; for example, we could
add a lter to keep the dice rolls that win a hardways bet.
The while statement can be used for ltering. Additionally, the break and continue statements can
simplify very complex lters. The typical pattern for a lter uses an initialize, a while not nished,
lter condition and an update the results.
Initialize the Results
result = None
While Not Finished Filtering:
Filter Condition
If condition based on i :
Update the results
result = ...
The result of a lter is a subset of the input values. It may be the original set of values, in the rare case that
every value passes the ltering test. It may be a single value if we are searching for just one occurrence of a
value that passes the lter.
139
The words for and in and the : are essential syntax. The suite is an indented block of statements. Any
statement is allowed in the block, including indented for statements.
There are a number of ways of creating the necessary sequence of values. The most common way to create a
sequence is to use the range() function. First well look at the for statement, then well provide a denition
for the range() function.
Printing All The Values. This rst example uses the range() function to create a sequence of six values
from 0 to just before 6. The for statement iterates for all values of the sequence, assigning each value to the
local variable i. For each of six values of i, the suite of statements inside the for statement is executed.
The suite of statements is just a print() function, which has an expression that adds one to i and prints
the resulting value.
for i in range(6):
print(i+1)
We can summarize this as for all i in the range of 0 to one before 6, print i +1.
Using A Literal Sequence Display. We can also create the sequence manually, using a literal sequence
display. A sequence display looks like this: [ expression , ... ]. Its a list of expressions; for now they should
be numbers separated by commas. The square brackets are essential syntax for marking a sequence. Well
return to sequences in Basic Sequential Collections of Data.
This example uses an explicit sequence of values. These are all of the red numbers on a standard Roulette
wheel. It then iterates through the sequence, assigning each value to the local variable r. The print()
function prints all 18 values followed by the word red.
for r in [1,3,5,7,9,12,14,16,18,19,21,23,25,27,30,32,34,36]:
print(r, "red")
Summing All The Values. The second example sums a sequence of ve odd values from 1 to just before
10. The for statement iterates through the sequence, assigning each value to the local variable j. The
print() function prints the value.
sum= 0
for j in range(1,5*2,2):
sum += j
print(sum)
The range() Function. The range() function has two optional parameters, meaning it has three forms.
range(x) sequence
Generates values from 0 to x-1, incrementing by 1.
range(x, y) sequence
Generates values from x to y -1, incrementing by 1. Each value, v will have the property x v < y.
range(x, y, z) sequence
Generates values from x to y - z, incrementing by z. The values will be x, x + z, x + 2z, . . . , x + kz,
where x + kz < y.
From this we can see the following features of the range() function. If we provide one value, we get a
sequence from 0 to just before the value we provided. If we provide two values we get a sequence from the
starting value to one before the ending value. If we provide three values, the third value is the increment
between each value in the sequence.
140
>>>
[0,
>>>
[1,
>>>
[1,
range(6)
1, 2, 3, 4, 5]
range(1,7)
2, 3, 4, 5, 6]
range(1,11,2)
3, 5, 7, 9]
We can interpret this as a mapping from two dice to the sum of those two dice. This is a kind of twodimensional table with one die going down the rows and one die going across the columns. Each cell of the
table has the sum written in.
The output from this example, though, doesnt look like a table because its written down the page, not
across the page. To write across the page, we can make use of a feature of the print() function. Well
manually set the end-of-line to ',' or '\n'.
141
table.py
1
2
3
4
5
6
7
2. This is the rst line of our table, showing the column titles.
3. Here we print the header for each row. Since this print sets end to ' ', this does not print a complete
line.
6. We print a single cell. Since this print() function sets end to ' ', this does not nish the output line.
7. This print() function does not set end. The default value is '\n'. Therefore, this is the end of the
line. The preceding row label and 6 values will be a complete line.
This previous example is a mapping from the sample number, (i) to two random dice (d1, d2), and then the
two dice are mapped to a single sum.
Well expand this simple loop to do some additional processing in While We Have More To Do : The while
Statement.
142
Development Cost, where E is eort in sta-months, R is the billing rate. C is the cost in dollars
(assuming 152 working hours per sta-month)
C = E R 152
Project Duration, where E is eort in sta-months. D is duration in calendar months.
D = 2.5 E 0.38
Stang, where E is eort in sta-months, D is duration in calendar months. S is the average sta size.
S=
E
D
Evaluate these functions for projects which range in size from 8,000 lines (K = 8) to 64,000 lines (K
= 64) in steps of 8. Produce a table with lines of source, Eort, Duration, Cost and Sta size.
2. Wind Chill Table.
Used by meteorologists to describe the eect of cold and wind combined. Given the wind speed in
miles per hour, v, and the temperature in F, t, the Wind Chill, w, is given by the formula below. See
Wind Chill in Expression Exercises for more information.
35.74 + 0.6215 T 35.75 (V 0.16 ) + 0.4275 T (V 0.16 )
Wind speeds are for 0 to 40 mph, above 40, the dierence in wind speed doesnt have much practical
impact on how cold you feel.
Evaluate this for all values of V (wind speed) from 0 to 40 mph in steps of 5, and all values of T
(temperature) from -10 to 40 in steps of 5.
3. Celsius to Fahrenheit Conversion Tables.
For values of Celsius from -20 to +30 in steps of 5, produce the equivalent Fahrenheit temperature.
The following formula converts C (Celsius) to F (Fahrenheit).
For values of Fahrenheit from -10 to 100 in steps of 5, produce the equivalent Celsius temperatures.
The following formula converts F (Fahrenheit) to C (Celsius).
212 32
C
100
100
C = (F 32)
212 32
F = 32 +
33(s f )
c(d + 33)
143
The suite is an indented block of statements. Any statement is allowed in the block, including indented
while statements.
As long as the expression is true, the suite is executed. This allows us to construct a suite that steps through
all of the necessary tasks to reach a terminating condition. It is important to note that the suite of statements
must include a change to at least one of the variables in the while expression. Should your program execute
the suite of statements without changing any of the variables in the while expression, nothing will change,
and the loop will not terminate.
Theres an intentional parallelism between the while statement and the if statement. Both have a suite
which is only executed when the condition is True. The while statement repeatedly executes the suite,
where the if statement only executes the suite once.
100 Random Dice. Lets look at some examples. This rst example is a revision of the last example in
The for Statement; it shows that there is considerable overlap between the while statement and the for
statement. Both can do similar jobs.
from __future__ imprort print_function
import random
sample= 0
while sample != 100:
d1= random.randrange(6)+1
d2= random.randrange(6)+1
sample= sample + 1
print(d1+d2)
This previous example is a mapping from the sample number, (sample) to two random dice, and then the
two dice are mapped to a single sum.
Sum of Odd Numbers. Heres a more sophisticated example that computes the sum of odd numbers 1
through 9.
The loop is initialized with num and total each set to 1. We specify that the loop continues while num = 9.
In the body of the loop, we increment num by 2, so that it will be an odd value; we increment total by num,
summing this sequence of odd values.
144
When this loop is done, num is 9, and total is the sum of odd numbers less than 9: 1+3+5+7. Also note
that the while condition depends on num, so changing num is absolutely critical in the body of the loop.
num, total = 1, 1
while num != 9:
total= total + num
num= num + 2
This example is a kind of reduction. We are reducing a sequence of odd numbers to a sum, which happens
to be the square of the number of values we summed. Roll Dice Until Craps. Heres a more complex
example. This iteration counts dice rolls until we get a 7. Note that our loop depends on d1 and d2 changing.
Each time the suite inside the while statement nishes, we restore the initial condition of having an unknown
values for d1 and d2.
from __future__ import print_function
import random
rolls= 0
d1,d2=random.randrange(6)+1,random.randrange(6)+1
while d1 + d2 != 7:
rolls += 1
d1,d2=random.randrange(6)+1,random.randrange(6)+1
print(rolls)
This example is a search. We are search through a sequence of random dice rolls looking for the rst seven.
We are reducing the list of dice rolls to a count of the number of rolls.
Tip: Debugging the while Statement
If you are typing a while statement, and you get a SyntaxError: invalid syntax, you omitted the :.
There are several problems that can be caused by an incorrectly designed while statement.
The while loop never stops! The rst time you see this happen, youll probably shut o your computer.
Theres no need to panic however, there are some better things to do when your computer appears hung
and doesnt do anything useful.
When your loop doesnt terminate, you can use Ctrl-C to break out of the loop and regain control of your
computer. Once youre back at the >>> you can determine what was wrong with your loop. In the case of a
loop that doesnt terminate, the while expression is always True. There are two culprits.
You didnt initialize the variables properly. The while expression must eventually become False for
the loop to work. If your initialization isnt correct, you may have created a situation where it will
never become False.
You didnt change the variables properly during the loop. If the variables in the while expression
dont change values, then the expression will never change, and the loop will either never iterate or it
will never stop iterating.
If your loop never operates at all, then the while expression is always False. This means that your
initialization isnt right. A few print statements can show the values of your variables so you can see
precisely what is going wrong.
One rare situation is a loop that isnt supposed to operate. For example, if we are computing the average
of 100 dice rolls, well iterate 100 times. Sometimes, however, we have the degenerate case, where we are
trying to average zero dice rolls. In this case, the while expression may start out False for a good reason.
We can get into trouble with this if some of the other variables are not be set properly. This can happen
when youve made the mistake of creating a new variable inside the loop body. To be sure that a loop is
designed correctly, all variables should be initialized correctly, and no new variables should be created within
the loop body; they should only be updated.
145
If your loop is inconsistent it works for some input values, but doesnt work for others then the body of
the loop is the source of the inconsistency. Every if statement alternative in the suite of statements within
the loop has establish a consistent state at the end of the suite of statements.
Loop construction can be a dicult design problem. Its easier to design the loop properly than to debug a
loop which isnt working. Well cover this in A Digression On Design.
3. The initialization for the outer loop creates a counter which will hold the total number of rolls before
getting a seven for all of the games. The loop uses range(100) to assure that we gather data for all
100 simulations. In eect, this loop is a mapping from simulation number (i) to a count of rolls before
rolling a 7. This outer loop is also a reduction that uses 100 simulations to compute the total number
rolls to get a 7.
6. The initialization for the inner loop creates a counter (rolls) which will hold the number of rolls before
getting a seven in this game only. It also initializes a pair of dice (d1 and d2) to the rst roll. This is
the typical initialization for a reduction.
7. While we havent rolled a 7, well count one non-7 roll, then well roll the dice again. Note that the
body of the while statement starts with an unknown pair of dice. When the pair is evaluated and
found to be a number other than 7, a new pair of dice is created, restoring this condition that the dice
are unknown. This is a typical search loop: we are searching for a 7 and counting the number of rolls
until we nd one.
10. Once weve rolled a 7, we add the number of rolls to our total. Since we are computing a single value
from 100 samples, this is a reduction.
In Simulating All 100 Rolls of the Dice, a for statement is used to iterate through 100 samples of data
gathering. Replace this for statement with the equivalent statements using while. Hint: youll have
to add two new statements in addition to replacing the for statement.
2. Greatest Common Divisor.
The greatest common divisor is the largest number which will evenly divide two other numbers. Examples: GCD( 5, 10 ) = 5, the largest number that evenly divides 5 and 10. GCD( 21, 28 ) = 7, the
largest number that divides 21 and 28.
GCDs are used to reduce fractions. Once you have the GCD of the numerator and denominator, they
can both be divided by the GCD to reduce the fraction to simplest form. 21/28 reduces to 3/4.
Greatest Common Divisor of two integers, p and q
147
1 1 1 1
1
=1 + +
+
4
3 5 7 9 11
2
1
1
1
= 1 + 2 + 2 + 2 +
6
2
3
4
)
( 1 )( 4
2
1
1
=
16k
8k + 1 8k + 4 8k + 5 8k + 6
0k<
1 12 123
=1+ +
+
+
2
3 35 357
For each of these youll need to construct a loop that develops each term and adds it in to the total.
At some point the terms will be so small that they dont contribute signicantly to the answer; this is
when the loop should stop.
The third form uses summation () notation, telling us that the variable k takes on values from 0
to innity. As a practical matter, k will go from zero to a value large enough that the expression
computed is about zero. For more information on the () operator, see Translating From Math To
Python: Conjugating The Verb To Sigma.
7. Computing e.
A logarithm is a power of some base. When we use logarithms, we can eectively multiply numbers
using addition, and raise to powers using multiplication. Two Python built-in functions are related to
this: math.log() and math.exp(). Both of these compute what are called natural logarithms, that
148
is, logarithms where the base is e. This constant, e, is available in the math module, and it has the
following formal denition:
Denition of e.
e=
0k<
1
k!
For more information on the () operator, see Translating From Math To Python: Conjugating The
Verb To Sigma.
The n! operator is factorial. Interestingly, its a post-x operator, it comes after the value it applies
to.
n! = n (n 1) (n 2) 1.
For example, 4! = 4 3 2 1 = 24. By denition, 0! = 1.
1
1
1
1
If we add up the values 0!
+ 1!
+ 2!
+ 3!
+ we get the value of e. Clearly, when we get to about
1/10!, the fraction is so small it doesnt contribute much to the total.
1
k!
However, if we have a temporary value of k!, then each time through the loop we can multiply this
temporary by k, and then add 1/temp to the sum.
You can test by comparing your results against math.e, e 2.71828 or math.exp(1.0).
8. Hailstone Numbers.
For additional information, see [Banks02].
Start with a small number, n, 1 n < 30.
There are two transformation rules that we will use:
If n is odd, multiple by 3 and add 1 to create a new value for n.
If n is even, divide by 2 to create a new value for n.
Perform a loop with these two transformation rules until you get to n = 1. Youll note that when n =
1, you get a repeating sequence of 1, 4, 2, 1, 4, 2, ...
You can test for oddness using the % (remainder) operation. If n % 2 == 1, the number is odd,
otherwise it is even.
The two interesting facts are the path length, the number of steps until you get to 1, and the
maximum value found during the process.
Tabulate the path lengths and maximum values for numbers 1..30. Youll need an outer loop that
ranges from 1 to 30. Youll need an inner loop to perform the two steps for computing a new n until
n == 1; this inner loop will also count the number of steps and accumulate the maximum value seen
during the process.
Test: for 27, the path length is 111, and the maximum value is 9232.
149
This concept of a summary or abstraction that embodies a number of standard details is an important
tool for programmers. In future sections well talk about creating these kind of processing summaries.
In eect, well add new verbs to the Python language.
When we look at our computer, the operating system, the Python program, we see this layering eect.
Each layer adds features, and makes the lower layers easier to use. The for statement continues this
layering by enabling us to write iterations in a single statement that would have taken three statements.
The break statement is always found within if statements within the body of a for or while loop. The
surrounding if statement has the terminating condition. A break statement can, for example, end a for
before the end of the sequence has been reached.
Heres a complex terminating condition: we want to simulate parts of a Craps game that ends when we roll
a 7 or the game lasts more than ve rolls of the dice. We initialize our loop by determining two random
values for d1 and d2. Our loop will use a the range(5) sequence of ve values to provide an upper limit on
the number of dice we will roll. Also, well break out of the loop if the dice total 7.
from __future__ import print_function
import random
d1,d2=random.randrange(6)+1,random.randrange(6)+1
for i in range(5):
if d1+d2 == 7:
break
d1,d2=random.randrange(6)+1,random.randrange(6)+1
if d1+d2 == 7:
print("rolled 7")
150
else:
print("5 rolls without a 7")
Heres a contrived example of using continue to gracefully ignore certain numbers in a sequence. In this
case, when i % 2 == 0, we have a number that can be divided by 2 with no remainder; an even number.
Since we continue the loop for even numbers, we will only accumulate odd numbers in total.
from __future__ import print_function
total = 0
for i in range(20):
if i % 2 == 0:
continue
total += i
print("total", total)
151
sixodds.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2. We import the random module, so that we can generate a random sequence of spins of a Roulette
wheel.
3. We initialize oddCount, our count of odd numbers seen in a row. It starts at zero, because we havent
seen any add numbers yet.
4. The for statement will assign 100 dierent values to s, such that 0 s < 100. This will control our
experiment to do 100 spins of the wheel.
5. Note that we save the current value of s in a variable called lastSpin, setting up part of our postcondition for this loop. We need to know how many spins were done, since one of the exit conditions
is that we did 100 spins and never saw six odd values in a row. This exit condition is handled by the
for statement itself.
6. We set n to a random spin of the wheel. Weve asked for a random number from a pool of 38 numbers.
This is the size of the usual double zero Roulette wheel.
8. Well treat 37 as if it were 00, which is like zero. In Roulette, these two numbers are neither even nor
odd. The oddCount is set to zero, and the loop is continued. This continue statement resumes loop
with the next value of s. It restarts processing at the top of the for statement suite.
12. When we determine that the number is odd by testing to see if the remainder is 1 when the spin, n, is
divided by 2. If the spin is odd, the oddCount variable is incremented by 1.
14. We check the value of oddCount to see if it has reached six. If it has, one of the exit conditions is
satised, and we can break out of the loop entirely. We use the break statement to exit from the loop,
winding up after the for statement. If oddCount is not six, we dont break out of the loop, we use the
continue statement to restart the for statement suite from the top with a new value for s.
17. We threw in an assert (see the next section, The assert Statement, for more information on this
statement) that the spin, n, is even and not 0 or 37. This is kind of a safety net. If either of the
preceding if statements were incorrect, or a continue statement was omitted, this statement would
uncover that fact. We could do this with another if statement, but we wanted to introduce the assert
statement.
18. If the number is even, we also set the oddCount to 0.
152
19. At the end of the loop, lastSpin is the number of spins and oddCount is the most recent count of odd
numbers in a row. Either varname is six or lastSpin is 99. When lastSpin is 99, that means that spins
0 through 99 were examined; there are 100 dierent numbers between 0 and 99.
If the assertion condition is False, the program is in error, and raises an AssertionError exception. If the
expression is given, the AssertionError exception is raised using the expression. Well cover exceptions in
detail in The Unexpected : The try and except statements. For now, the most important part of raising an
exception is that the program stops.
Heres an example of using assert to prove that out program works. Were trying to set max to the larger
of two values, a or b. We include an assertion with a formal denition of what value max should have. It
should be either a or b, and the larger of the two values.
max= 0
if a < b: max= b
if b < a: max= a
assert (max == a or max == b) and max >= a and max >= b
If the assertion condition is true, the program continues. If the assertion condition is false, the program
raises an AssertionError exception and stops, showing the line where the problem was found.
Run this program with a equal to b and not equal to zero; it will raise the AssertionError exception.
Clearly, the if statements dont set max to the largest of a and b when a = b. There is a problem in the if
statements, and the presence of the problem is revealed by the assertion.
Tip: Debugging the assert Statement
The assert statement is an important tool for debugging other problems in your program. It is rare to have
a problem with the assert statement itself. The only thing you have to provide is the condition which must
be true. If you cant formulate the condition in the rst place, it means you may have a larger problem in
describing what is supposed to be happening in the program in general. If so, it helps to take a step back
from Python and try to write an English-language description of what the program does and how it works.
Clear assert statements show a tidy, complete, trustworthy, reliable, clean, honest, thrifty program. Seriously. If you can make a clear statement of what must be true, then you have a very tight grip on what
should be happening and how to prove that it really is happening. This is the very heart of programming:
translating the programs purpose into a condition, creating the statements that make the conditions true,
and being able to back this design up with a proof and a formal assertion.
153
If the else clause is provided, the else-suite of statements is executed when the loop terminates normally.
This suite is skipped if the loop is terminated by a break statement.
The else clause on a loop might be used for some post-loop cleanup. This is so unlike other programming
languages, that it is hard to justify using it.
Even in the if statement, an else clause raises a small question when it is used. Its never perfectly clear
what conditions lead to execution of an else clause. The condition that applies has to be worked out from
context. For instance, in if statements, one explicitly states the exact condition for all of the if and elif
clauses. The logical inverse of this condition is assumed as the else condition. It is, unfortunately, left to
the person reading the program to work out what this condition actually is.
Similarly, the else clause of a while statement is the basic loop termination condition, with all of the
conditions on any break statements removed. The following kind of analysis can be used to work out the
condition under which the else clause is executed.
while not BB:
if C1: break
if C2: break
else:
# Implied: BB and not C1 and not C2
assert BB or C1 or C2
Because this analysis can be dicult, it is best to avoid the use of else clauses in for or while statements.
154
if a>=b: m=a
elif b>=a: m=b
This if statement has the statements that will set m to the larger of a or b. Each assignment is associated
with a condition under which that assignment statement solves the problem.
The Post-Condition. Note that the hard part is establishing the post-condition. Once we have that
stated correctly, its relatively easy to gure the basic kind of statement that might make some or all of the
post-condition true. Then we do a some algebra to ll in any guards or loop conditions to make sure that
only the correct statement is executed.
There are several considerations when using the while statement. This list is taken from David Gries, The
Science of Programming [Gries81].
1. The variables changed in the body of the loop must be initialized properly. If the loops while-expression
is initially false, everything is set correctly.
2. At the end of the suite, the condition that describes the state of the body variables is just as true as
it was after initialization. This is called the invariant, because it is always true during the loop.
3. When this invariant body condition is true and the while-expression is false, the loop will have completed properly.
4. When the while-expression is true, there are more iterations left to do. If we wanted to, we could dene
a mathematical function based on the current state that computes how many iterations are left to do;
this function must have a value greater than zero when the while-expression is true.
5. Each time through the loop we change the state of our variables so that we are getting closer to making
the while-expression false; we reduce the number of iterations left to do.
While these conditions seem overly complex for something so simple as a loop, many programming problems
arise from missing one of them.
Gries recommends putting comments around a loop showing the conditions before and after the loop. Since
Python provides the assert statement; this formalizes these comments into actual tests to be sure the
program is correct.
An Example. Lets put a particular loop under the microscope. This is a small example, but shows all of
the steps to loop construction. We want to nd the least power of 2 greater than or equal to some number
greater than 1, call it x. This power of 2 will tell us how many bits are required to represent x, for example.
We can state this mathematically as looking for some number, n, such that 2n1 < x 2n . This says
that if x is a power of 2, for example 64, wed nd 2 6 . If x is another number, for example 66, wed nd
26 < 66 27 , or 64 < 66 128.
We can start to sketch our loop.
assert x > 1
... initialize ...
... some loop ...
assert 2**(n-1) < x <= 2**n
We work out the initialization to make sure that the invariant condition of the loop is initially true. Since
x must be greater than or equal to 1, we can set n to 1. We can see that 211 = 20 = 1 < x. This will set
things up to satisfy rule 1 and 2.
assert x > 1
n= 1
... some loop ...
assert 2**(n-1) < x <= 2**n
155
In loops, there must be a condition on the body that is invariant, and a terminating condition that changes.
The terminating condition is written in the while clause. In this case, it is invariant (always true) that
2n1 < x. That means that the other part of our nal condition is the part that changes.
assert x > 1
n= 1
while not ( x <= 2**n ):
n= n + 1
assert 2**(n-1) < x
assert 2**(n-1) < x <= 2**n
The next to last step is to show that when the while condition is true, there are more than zero trips through
the loop possible. We know that x is nite and some power of 2 will satisfy this condition. There exists
some n such that n 1 < log2 n n that limits the trips through the loop.
The nal step is to show that each cycle through the loop reduces the trip count. We can argue that
increasing n gets us closer to the upper bound of log2().
We should add this information on successful termination as comments in our loop.
6.8.1 Comments
Comments are notes and remarks to people who are looking at the program. They arent used by Python
program, but can be reminders and clarications.
Comments can be placed anywhere in our program. Theyre covered by the syntax rules missing from Instant
Gratication : The Simplest Possible Conversation.
Important: Syntax Rule Three
156
Everything from a # to the end of the line is ignored by the Python program.
If the # occurs inside a quoted string, it is just another character. The # for a comment must occur outside
a string.
The GNU/Linux shell can sneak a look at the rst line of a Python le. If the rst line is the special #!
comment, this denes the interpreter that will be used. Consequently, many Python les begin with the
following line:
#!/usr/bin/env python
157
Change your current working directory to the correct location of your les. For Windows: use
CD; for GNU/Linux and MacOS: use cd. For example, if your les are in an exercises directory,
you can do cd exercises.
Include the directory name on your le. For example, if your les are in an exercises directory,
you can run the script1.py script with python exercises/script1.py.
3. If you can nd Python, and you appear to be in the correct directory, the remaining problem is
misspelling the lename for your script. This is relatively common, actually. First time GNU/Linux
and MacOS users will nd that the shell is sensitive to the case of the letters, that some letters look
alike, it is possible to embed non-printing characters in a le name, and it is unwise to use letters which
confuse the shell. We have the following advice.
File names in GNU/Linux should be one word, all lower case letters and digits. These are the
standard Python expectations for module names. While there are ways around this by using the
shells quoting and escaping rules, Python programs avoid this.
File names should avoid punctuation marks. There are only a few safe punctuation marks: -, .
and _. Even these safe characters should not be the rst character of the le name.
Some Windows programs will tack an extra .txt on your le. You may have to manually rename
the le to get rid of this.
In GNU/Linux, you can sometimes embed a space or non-printing character in a le name. To
nd this, use the ls -s to see the non-printing characters. Youll have to resort to fairly complex
shell tricks to rename a badly named le to something more useful. The % character is a wild-card
which matches any single character. If you have a le named script^M1.py, you can rename this
with mv script%1.py script1.py. The % will match he unprintable ^M in the le name.
Looking Forward. In the long run, we dont always want to have our Python application program les
and our data les all mixed up together in the same directory. Wed like to be able to put our programs in
a directory like /usr/local/myapp or C:\Program File\MyApp. GNU/Linux and MacOS have many tricks
for making programs easy to start. Well look at some techniques for this in the next section.
158
The very cool part of this trick is that #! is a comment to Python. Comments are simply ignored by Python.
This rst line of our Python le is, in eect, directed at the shell. The shell uses the rst line of our le as
a hint to see what the language is, and Python studiously ignores it.
A le that is going to be executable in would look like this:
celsius.py
#!/usr/bin/env python
# Convert 65F to Celsius
from __future__ import print_function
print(65, "F")
print((65-32) * 5 / 9, "C")
The last example shows the prompt from a RedHat GNU/Linux computer, named linux01.
Security Consideration
This last example depends on the way the shells assure security. The ./ prex forces the shell to look
in the current directory for the le name that matches the command. This is optional in some shell
environments; it always works, but isnt always necessary.
The reason for this extra punctuation is to make it dicult to override the built-in shell command
names with rogue viruses or spyware. Think of what could happen if you wrote your own version of
chmod which could then add a virus to every le that was marked executable. The way to prevent
this is to assure that the search path is tightly controlled and the user has to specically add to the
search path to run a program that isnt built in to the operating system. Since no shell script would
ever say ./chmod, a rogue chmod program would never get used.
Bottom Line. We have to do two things to make this implicit execution work. We have to make our le
executable with chmod, and we have to include the magic #!/usr/bin/env python line as the very rst
line in the le.
We need to emphasize two parts of this recipe.
The executable mode setting remains with a le forever, so we only need to set it once, when we create
the le. When we forget to do this [I dont say if we forget everyone forgets], we get an error that
says our le cant be found as a command. This is a hint that we forget to mark the le as executable.
The chmod command is done in the Terminal, not in IDLE. This is a command to the shell, and is
not a Python statement. It helps to have an extra terminal window open for this kind of thing.
159
This le association is typically set by the Python installer. In the unlikely event that you dont have a proper
association between Python les and the Python applications, you can create or modify this association with
the Folder Options control panel. The File Types tab allows you to pair a le type with a program that
processes the le. It is often simpler to uninstall and reinstall Python.
Setting the Python File Association
1. Open the Control Panel
Use the Start menu, Settings sub menu to locate your Control Panel.
2. Open the Folder Options Control Panel
Double-click the Folder Options Control Panel. This opens the Folder Options panel.
3. Open the File Types Tab of the System Control Panel
Click the File Types tab on the Folder Options Control Panel.
There are two areas: Registered File Types and Details for the selected type.
160
161
Note that we included the #! code on the rst line. This is a Python comment; it is really only used by
GNU/Linux users, but it doesnt hurt for Windows or MacOS programmers to include this.
After we nish editing, we mark the le we made as executable using chmod +x example2.py. Since this
is a property of the le, once weve set it, the executable status remains true no matter how many times we
edit, copy or rename the le. We only have to make a le executable once, the rst time we work with it.
When we run this in GNU/Linux or MacOS, we see the following.
[slott@linux01 slott]$ ./example2.py
0.0112962280375
Which says that spinning six reds in a row is about a 1 in 89 probability. If we won all six spins in a row,
wed have made 64 times our bet.
The typical trick in GNU/Linux is to put our Python and shell les into a single directory, and then add
this directory to the
PATH setting. We might put our les into /usr/local/myapp, then add /usr/local/myapp to the PATH.
162
Windows .BAT File. Heres the one line we need to put into our le for Windows folks who created a
C:\MyApp\bin\example1.py. If we make sure that this example1.bat le is on our path, we can then be
in any working directory and run our example.
Note that weve studiously avoided a lename with a space in it. We didnt put our application into
C:\Program Files because wed have to work around that pesky space.
example1.bat
python C:\MyApp\bin\example1.py
The typical trick in Windows is to put our Python and shell les into a single directory, and then add this
directory to the
PATH setting. We might put our les into C:\MyApp\bin, then add C:\MyApp\bin to the PATH.
163
164
CHAPTER
SEVEN
Our initial programs have been sequences of statements. As our programs get more complex, we will nd
that this style of long, at program is hard to work with. In Adding New Verbs : The def Statement
well introduce the primary method for structuring and organizing our application programs, the function.
It turns out that breaking a program into separate functions allows us to decompose a solution into several
simpler parts. Functions are also a good intellectual tool to help us divide and conquer a complex problem.
Well add several useful features in Flexibility and Clarity : Optional Parameters, Keyword Arguments.
These will add exibility so that its easier to understand and use the functions we dene.
In A Few More Function Denition Tools, well show a number of unique features that make Pythons
function denitions much cooler than other programming languages
We can think of multiplication as a function, also. Multiplication maps a pair of values, a and b, to a new
value, c, such that c = a b. The domain is pairs of numbers, a, b, the range is a number.
When we looked at the functions in the math module in Better Arithmetic Through Functions, they t this
mold perfectly. The random module, however, has a bit of a problem. In The random Module Rolling the
Dice we saw that many of these functions dont have a domain value, they only have a range value.
Oddly, raw_input() function that we looked at in Can We Get Your Input? allows the user to enter the
range value. This doesnt seem to t the strict mathematical sense of mapping from domain to range. Theres
no real domain and the user could enter just about anything they wanted.
The Language of Planet Python. Clearly, Python doesnt adhere to the letter of the formal mathematical
denition. This is one of those cases where the computer science folks borrowed a word from mathematics, but
had to stretch the meaning a bit to make it useful. While many Python functions are proper mathematical
functions, Python allows us to use some additional patterns. We can dene functions which do not need a
domain value, but create new objects from scratch. Also, we can dene functions that dont return values,
but instead have some other side-eect, like creating a directory, or removing a le.
So what is a function in Python? A Python function is more like a verb than it is like a mapping. The
mathematical functions, for example, have an implied sense of compute. You can think of sqrt() as
compute the square root of. You can also think of it as map, as in map a number to its square root.
Factory functions are a little dierent, in that less transformation is done. These are generally just a change
in representation. You can think of the factory functions (in Functions are Factories (really!)) as wasy to
create from; int() can be interpreted as create the int value from.
Dening a Function. In Python, we dene a function by providing three pieces of information:
The name of the function. Hopefully this is descriptive; usually it is verb-like.
A list of zero or more variables, called parameters; this denes the domain or input values. The phrase
zero or more means that parameters are optional.
A suite of one or more statements. If this contains a return statement, this denes the range or output
value. The phrase one or more means that statements are not optional.
Interestingly, the return statement is optional.
Typically, we create function denitions in script les because we dont want to type them more than once.
We can then import the le with our function denitions so we can use them. IDLE helps us do this import
with the Run menu Run Module item, usually F5.
Using a Function. When we used functions like math.sqrt() in an expression, we provided argument
values to the function in (). The Python interpreter evaluates the argument values, then applies the function.
When we use functions that we dene, well use the name we gave to our function in front of the (). For
more information on this evaluate-apply cycle, see The Evalute-Apply Principle
Evaluating and applying a function (sometimes termed calling the function) means that Python does the
following:
1. Evaluate the argument expressions.
2. Assign the argument values to the function parameter variables.
3. Evaluate (or call) the suite of statements that are the functions body. In this body, the return
statement denes the result value for the function. If there is no return statement, the value None is
returned.
4. Replace the function with the returned value, and nish evaluation of the expression in which the
function was used.
We have to make a rm terminology distinction between an argument value, an object that is created or
updated during execution, and the dened parameter variable of a function. The argument is the object
166
used in particular application of a function; it has a life before and after the function. The parameter is the
name of a variable that is part of the function, and is a variable that exists only while Python is evaluating
the function body.
The name is the name by which the function is known. It must be a legal Python name; the rules are the
same for function names as they are for variable names. The name must begin with a letter (or _) and can
have any number of letters, digits or _. See Python Name Rules.
Each parameter is a variable name; these names are the local variables which will be assigned to actual
argument values when the function is applied. We dont type the [ and ]s; they show us that the list of
names is optional. We dont type the ...; it shows us that any number of names can be provided. Also, the
, shows that when there is more than one name, the names are separated by ,.
The suite (which must be indented) is a block of statements that computes the value for the function. Any
statements may be in this suite, including nested function denitions.
The rst line of a function is expected to be a document string (called a docstring, and generally a triplequoted """ string) that provides a basic description of the function. Well return to this docstring in
Functions Style Notes.
Returning a Result. A return statement species the result value of the function. This value will
become the result of applying the function to argument values. This value is sometimes called the eect of
the function.
return [ expression ]
The expression is the nal result of the function. We dont type the [ and ]s, they show us that the expression
is optional. If we dont provide one, the Python value of None will be returned.
Lets look at a complete, although silly, example.
def odd( spin ):
"""Return "odd" if this spin is odd."""
if spin % 2 == 1:
return "odd"
return "even"
We name this function odd(), and dene it to accept a single parameter, named spin. We provide a docstring
with a short description of the function. In the body of the function, we test to see if the remainder of spin2
is 1; if so, we return "odd". Otherwise, we return "even" .
Use. We would use our odd() function like this. This example will generate a random spin, s , between 0
and 36. (These are the rules for European Roulette, with a single zero.) Well use our odd() function to
determine if the spin was even or odd.
from __future__ import print_function
import random
s = random.randrange(37)
167
if s == 0:
print("zero")
else:
print(s, odd(s))
168
If you want too see the power of the docstring, look back at our odd() function. Heres what happens when
we ask for help.
>>> help(odd)
Help on function odd in module __main__:
odd(spin)
Return "odd" if this spin is odd.
If youre using Python directly, that is, you are not using IDLE, this will look a little dierent. See the
sidebar for a little bit of information on the help viewer that may be used.
Tip: Direct Python and Help()
When executing help() while using Python directly (not using IDLE), youll be interacting with a help
viewer that allows you to scroll forward and back through the text.
For more information on the help viewer, see Getting Help.
On the Mac OS or GNU/Linux, youll see an (END) prompt telling you that youve reached the the document;
hit q to exit from viewing help.
Since our docstring shows up when we ask for help, we should be sure that weve put down everything we
need to remember about the function.
Rules of the Game. There are two important rules that bracket what a function can be used for. These
are constraints on what is a sensible denition of a function. Some functions will bend the second rule a bit.
1. A function has no memory. We call this stateless. Wed like to call this the no hysteresis rule
because the word hysteresis is exactly what were talking about; but hysteresis is a pretty obscure term
for inuenced by previous events. When we look at a function like sine or square root, the answer
doesnt depend on the previous requests for sines or square roots. The result only depends on the
inputs.
2. A function is idempotent. The term idempotency means that a function, given the same inputs,
always produces the same outputs. This is part of the standard mathematical denition of a function:
the same input produces the same output.
The random module has functions that bend this rule. Also raw_input() bends this rule.
These rules are so important that Python enforces them. The way Python enforces these rules is by automatically deleting any variables created inside a function when the function nishes.
Youll note that our random-number generating functions violate the idempotency rule. Each time you apply
the randrange() function, you get a dierent value. Clearly, this random number generator function does
something special and unusual to work around Pythons enforcement of the rules. Well return to this below.
When you seem to need a function that has a memory or a state change, you arent really talking about a
function anymore. To break the no hysteresis rule, youll need to dene an object, not a function. Dening
object classes will require many more language features than weve seen so far, so well introduce this later,
in Data + Processing = Objects.
Generally, any variable you use within a function body is private to that body. This is because all of a
functions variable names exist in a namespace that is local to the function. This includes the parameter
variables created by the denition and any local variables your statements create. The namespace (and the
variable names) cease to exist when the functions processing is complete. Well look at this more closely in
Keeping Track of Variable Names The Namespace.
Mathematical Functions. A mathematical function follows the standard denition of a transformation
from a domain to a range. All of the functions in the math module are examples of these. We can copy this
169
design pattern and create functions which transform an input to produce an output. Our example of odd()
in Function Denition: The def and return Statements followed this pattern.
These functions have no hysteresis (no memory) and are idempotent (same results for the same input). These
are well-behaved, and use a return statement to return a meaningful value.
The docstrings for these functions always look like this:
def myFunction( a, b ):
"""myFunction(a,b) -> someAnswer
Some short, clear explanation of myFunction.
"""
The suite of statements
Procedure Functions. One common kind of function is one that doesnt return a result, but instead carries
out some procedure. This function would omit any return statement. Or, if return statements are used to
exit from the function, they would have no value to return. Carrying out an action is sometimes termed a
side-eect of the function. The primary eect is the value returned.
These functions still have no hysteresis (no memory) and are idempotent. They just dont return a value.
Instead, we expect that their side-eect is the same each time we call it.
Heres an example of a function that doesnt return a value, but carries out a procedure.
def report( spin ):
"""report(spin)
Reports the current spin."""
if spin == 0:
print("zero")
return
print(spin, odd(spin))
This function, report(), has a parameter named spin, but doesnt return a value. Here, the return
statements exit the function but dont return values.
This kind of function would be used as if it was a new Python language statement, for example:
for i in range(10):
report( random.randrange(37) )
Here we execute the report() function as if it was a new kind of statement. We dont evaluate it as part of
an expression.
It turns out that any expression can be used as a complete statement. Since a function evaluation is an
expression, and an expression is a statement, a function call is a complete statement. Because of this, a
function denition can be like adding a new statement to the language.
The simple return statement, by the way, returns the special value None. This default value means that
you can dene your function like report(), above, use it (incorrectly) in an expression, and everything will
still work out nicely because the function does return a value.
for i in range(10):
t= report( random.randrange(37) )
print(t)
object as being encapsulated in the module. Well look at this later, when we talk about modules in Modules
: The unit of software packaging and assembly.
These functions can have hysteresis and may (or may not) be idempotent. In the case of random numbers,
we dont want idempotency, otherwise, wed just get the same number over and over again.
These functions broke the rules by using an object that is part of the module that contains the function. For
example, our random number generators functions use an object that is part of the random module. This is
almost the only example of this kind of accessor function that well use.
Spin is a String"""
def spinWheel():
"""Returns a string result from a Roulette wheel spin."""
t= random.randrange(38)
if t == 37:
return "00"
return str(t)
for i in range(12):
n= spinWheel()
report( n )
1. The odd() function is a simple mathematical function with a domain of numbers and a range of boolean
(True, False). If the number is odd, this function returns True; otherwise it returns False.
2. The report() function uses the odd() function to determine if the number is even or odd and write
an appropriate line to our nal report. This function doesnt return a useful value, and is a kind of
procedural function.
7.1. Adding New Verbs : The def Statement
171
3. The spinWheel() function uses random.randrange() to simulate a spin of the wheel and return that
value.
4. The main part of this program is this for loop at the bottom that uses the previous function denitions. It calls spinWheel(), and then report(). This generates and reports on a dozen spins of the
wheel.
For most of our exercises, this free-oating main procedure is acceptable. When we cover modules, in Modules
: The unit of software packaging and assembly, well need to change our approach slightly to something like
the following.
def main():
for i in range(12):
n= spinWheel()
report( n )
main()
This makes the main operation of the script clear, since we put it in a function named main().
172
"""
return number % 2 == "1"
1. We selected Run Module from the Run menu. Python imported our function1.py module to our
Python Shell.
2. We entered odd(2) and Pythons value for this function was False. Thats correct.
3. We entered odd(3) and Pythons value was also False. That cant be correct.
Whats wrong? How do we x it?. There arent many things can be wrong in this function. Weve
made a common mistake and used a string where we should have used a number. Look closely at the return
statement.
The number % 2 == "1" should be number % 2 == 1. We need to x function1.py.
After we x function1.py, we can loop back to step 2 in our procedure. This will remove the old denitions,
re-import our function and rerun our test. This whole sequence is handled by the Run Run Module,
available as F5. It clears out the old denitions by restarting Python and then importing our module.
In this case, weve got the function working correctly. Heres the corrected version.
function1.py Final Version
#!/usr/bin/env python
def odd( number ):
"""odd(number) -> boolean
Returns True if the given number is odd.
>>> odd(2)
False
>>> odd(3)
True
"""
return number % 2 == 1
Heres our interaction in the Python Shell window. The two function calls and their answers are a handy
little summary of how this function is supposed to work. Notice that we did a cut and paste from the Python
Shell window into the docstring inside the function. Thats the clearest way to dene the functions intended
purpose.
>>> ================================ RESTART ================================
>>>
>>> odd(2)
False
>>> odd(3)
True
>>>
173
Next Steps. Once we have the odd() function working, we can move on to debugging the spin() function,
then report() function and nally the main procedure that produces the report. We call this building and
testing in pieces iterative or incremental development.
Weve replaced x*y with mul(x,y), and replaced x-y with sub(x,y). This allows us to more clearly see how
evaluate-apply works. Each part of the expression is now written as a function with one or two arguments.
First the arguments are evaluated, then the function is applied to those arguments.
Heres the illustration of what has to happen to evaluate these functions.
function reaches down to get data from a lower-level function and passes the results back up to a higher-level
function.
Were going to show this as a list of steps, with > to show how the various operations nest inside each other.
Evaluate the arg to math.sqrt:
> Evaluate the args to sub:
> > Evaluate the args to mul:
> > > Get the value of b
> > Apply mul to b and b, creating r3=mul( b, b ).
> > Evaluate the args to mul:
> > > Evaluate the args to mul:
> > > > Get the value of a
> > > Apply mul to 4 and a, creating r5=mul( 4, a ).
> > > Get the value of c
> > Apply mul to r5 and c, creating r4=mul( mul( 4, a ), c ).
> Apply sub to r3 and r4, creating r2=sub( mul( b, b ), mul( mul( 4, a ), c ) ).
Apply math.sqrt to r2, creating r1=math.sqrt( sub( mul( b, b ), mul( mul( 4, a ), c ) ) ).
Notice that a number of intermediate results were created as part of this evaluation. If we were doing this
by hand, wed write these down as steps toward the nal result.
payment( principle,
interestRate,
175
See Simulating All 100 Rolls of the Dice for a simple loop that writes one hundred dice rolls. We
can dene a function which gets two random die values and returns the sum. You can replace the
random-number generation with a slightly simpler-looking function call.
Youll replace the following two lines with a function call.
d1= random.randrange(6)+1
d2= random.randrange(6)+1
176
3. Factorial Function.
Factorial of a number n is the number of possible arrangements of 0 through n things. It is computed
as the product of the numbers 1 through n. That is, 1 2 3 n.
The formal denition is
n! = n (n 1) (n 2) 1
0! = 1
We touched on this in Computing e. This function denition can simplify the program we wrote for
that exercise.
Factorial of an integer, n
177
In Object Methods A Cousin of Functions well describe how to use method functions as a prelude to
subjects in Basic Sequential Collections of Data. Methods are a kind of rst cousin to functions.
There is even more sophistication in how Python handles function parameters. Unfortunately, this has to
be deferred to A Dictionary of Extra Keyword Values, as it depends on a knowledge of dictionaries, which
we wont get to until :refdata.map.
The [ parameter [ = initializer ] ] tells us that a parameter, in general, is optional. Recall that we dont
actually enter the [ and ]s, theyre markers to help us understand optional parts of the syntax. The [ =
initializer ] tells us that a parameter may or may not have an initial value. While the [ and ]s tell us that
the initializer is optional, the = is essential punctuation for separating the parameter name from the initial
value.
Many Options. When there are a number of optional elements we will have several forms of function
denitions. Well look some of the various combinations that are available.
def myFunction(): is the no-parameters version. When you evaluate this function, you dont provide
any argument values.
def myFunction(req): denes a required parameter. When you evaluate this function, you must
provide an argument value for the required parameters.
def myFunction(opt=value): uses an initializer to dene an optional parameter. When you evaluate
this function, you may provide a argument value for this optional parameter. If you dont provide an
argument value, the default value will be used.
def myFunction(req,opt=value): is a mixture of required and optional values. With one optional
parameter, there are two ways to call this function: myFunction(r) and myFunction(r,o).
def myFunction(req,opt1=value,opt2=value): is a mixture of required and optional values. With
two optional parameters, there are four ways to call this function.
178
Other Kinds of Dice. Heres a small example of the most of the time with exceptions design pattern.
Weve been talking on and o about the casino game of Craps, which uses 6-sided dice. If we were talking
about role-playing games, we might introduce dice based on the Platonic solids which include 4-sided, 6sided, 8-sided, 12-sided and 20-sided dice. We could introduce other dice with asymmetric sides that include
10-sided or even 100-sided dice. How can we dene this in Python?
Heres a roll() function denition that has an optional parameter for the number of sides on the die. If no
value is provided, a default is used, which simulates a 6-sided die. If a value is provided, this is the number
of sides. Note that we dont require a specic kind of dice, and are perfectly willing to roll 11-sided dice if
thats what the game calls for.
import random
def roll( sides= 6 ):
return random.randrange(1,sides+1)
When you dene a function like this in IDLE, youll notice something very cool happens when you use the
function. When you type roll() a pop-up window appears that says sides=6, displaying the parameter
and the default value.
Rules of the Game. Theres an additional rule about positional parameter syntax that cant easily be
captured in our simple grammar depiction. Python requires us to place all of the required parameters before
all of the optional parameters.
This required-before-optional rule can seem capricious. However, the Python program must assign argument values to the parameter variables by position, from left to right.
Imagine the following hypothetical scenario.
def aBadIdea( opt=123, req ):
some function body
179
3. Finally, the optional parameter variables that didnt get an argument value will be assigned
their default initializer values. Once the parameter variables all have a value assigned, the
function suite can be executed.
Evaluated With The Right Number of Values. The function evaluation provides one argument
value for each parameter variable. This means each required parameter and each optional parameter
will have a value set by an argument. Once the parameters variables all have a value assigned, the
function suite can be executed.
Evaluated With Too Many Values. The function evaluation provides more argument values than
the allowed parameter variables. For now, we have to consider this as an error. Hint: theres a way
to cope with this, but it requires some additional types of collection data that we havent covered yet.
The full set of rules is something that has to wait until Mappings : The dict.
The Too Many Values rule is open to some debate. On one hand, if the arguments dont match the
parameters, something is clearly wrong. On the other hand, it can be useful to specify a function that will
handle an arbitrary number of parameters. The Python language doesnt impose one view or the other, it
allows you to pick a side in the debate. For now, we have to treat too many argument values as an error.
We will, eventually, have the option of coping with this situation.
Here are some additional examples, using a $10 or $24 dollar bet.
>>> payoutFrom( 10, 5 )
50
>>> payoutFrom( 10, 6, 5 )
12
>>> payoutFrom( 24, 5, 6, 0.05 )
19.0
Common Errors. If a required parameter (a parameter without a default value) is missing, this is a basic
TypeError.
Heres an example of a script where we dene a function that requires two argument values. We call it with
an incorrect number of arguments to see what happens.
180
badcall.py
#!/usr/bin/env python
from __future__ import print_function
def hack(a,b):
print(a+b)
hack(3)
181
By providing a default of None, the function can determine whether a value was supplied or not supplied.
This allows for complex default handling within the body of the function.
Bottom Line. There must be a value for all parameters. The basic rule is that the values of parameters
are set in the order in which they are dened. If an argument values is missing, and the parameter has a
default value, this is used.
These rules dene positional parameters: the position is the rule used for assigning argument values when
the function is evaluated.
182
Positional and Keyword. We have a total of four variations: positional parameters and keyword parameters, both with and without defaults. Positional parameters work well when there are few parameters and
their meaning is obvious. Keyword parameters work best when there are a lot of parameters, especially
when there are optional parameters.
Good use of keyword parameters mandates good selection of keywords. Single-letter parameter names or
obscure abbreviations do not make keyword parameters helpfully informative.
The syntax for providing argument values is very exible. Here are the semantic rules Python uses to assign
argument values to parameter variables.
1. Keywords. Assign values to all parameters given by name, irrespective of position. If the keyword
on the function evaluation is not an actual parameter variable, raise a TypeError.
2. Positions. Assign values to all remaining parameters by position. Its possible to mistakenly assign a
value by both keyword and position; if so, raise a TypeError.
3. Defaults. Assign defaults for any parameters that dont yet have values and have defaults dened; if
any parameters still lack values, raise a TypeError.
Average Dice. Heres another example with a simple parameter list. We need to know how many samples
to average. The number of sides on each die, however, has an obvious default value of six, because six-sided
dice are so common. However, well allow a user to override the number of sides in case they want to simulate
rolls of 4-sided or 12-sided dice.
import random
def averageDice( samples, sides=6 ):
"""Return the average of a number of throws of 2 dice."""
s = 0
for i in range(samples):
d1,d2 = random.randrange(sides)+1,random.randrange(sides)+1
s += d1+d2
return float(s)/float(samples)
Next, well show a number of dierent kinds of arguments to this function: keyword, positional, and default.
test1
test2
test3
test4
=
=
=
=
averageDice(
averageDice(
averageDice(
averageDice(
200 )
samples=200 )
200, 6 )
sides=6, 200 )
When the averageDice() function is evaluated to set test1, the positional form is used for samples, and a
default for sides. The second call of the averageDice() function uses the keyword form for samples, and a
default for sides. The third version provides two values positionally. The nal version supplies a keyword
value for sides; the value for samples is supplied by position.
Tip: Debugging Keyword Parameters
When you use a function, you have to provide actual argument values for each parameter that doesnt have
an initializer. Two things can go wrong here: the syntax of the function call is incorrect in the rst place,
or you havent provided values to all parameters.
183
You may have fundamental syntax errors, including mis-matched (), or a misspelled function name.
You can provide argument values by position or by using the parameter name or a mixture of both techniques.
Python will rst extract all of the keyword arguments and set the parameter values. After that, it will match
up positional parameters in order. Finally, default values will be applied. There are several circumstances
where things can go wrong.
A parameter is not set by keyword, position or default value
There are too many positional values.
A keyword is used that is not a parameter name in the function denition.
Create a function roll() that creates two dice values from 1 to 6 and returns their sum. The sum of
two dice will be a value from 2 to 12.
Create a main program that calls roll() to get a dice value, then calls field() with the value that
is rolled to get the payout amount. Compute the average of several hundred experiments.
2. Which is Clearer?.
How do keyword parameters help with design, programming and testing? Which is clearer, positional
parameter assignment or keyword assignment? Should one technique be used exclusively? What are
the benets and pitfalls of each variation?
184
All of the Python data types were going to introduce in Basic Sequential Collections of Data will use method
functions. This section will focus on the basic principles of how you use method functions. As with ordinary
functions, you need to know how to use them before you can design them.
The syntax for using (or calling) a method function looks like this: .(=,)
someObject
aMethod
[ [ parameter
] argument ]
...
A single . connects the owning object (someObject) with the method name (aMethod()). As with a function,
the () are essential to mark this as a method function evaluation.
The [ [ parameter= ] argument ]s indicate that the parameter keywords are permitted. Also, the [ and ]s
indicate that in general argument values are optional. Some method functions will compute results based
on the object itself, not on arguments to the function. The ... means that the argument values are repeated.
The , is the separator between argument values.
We have to make an important distinction here between the syntax and the semantics of using a function:
The syntax summary say that we can have any number of argument values.
Semantically, however, the argument values will be matched against the declared list of parameter
variables. If we provide too many values or too few values, well get an error.
Its important to note that we cant capture all of the semantics in our syntax summaries. Consequently, we
have to watch out for any of Pythons additional rules.
Two Small Examples. Here are two examples of how we apply these method functions to string objects.
>>> "Hi Mom".lower()
'hi mom'
>>> "The Walrus".upper()
'THE WALRUS'
In this example, we apply the lower() method function of the string object "Hi Mom".
We apply the upper() method function of the string object "The Walrus".
When we looked at the math and random modules in Meaningful Chunks and Modules, we were looking at
module functions. These module functions are imported as part of a module; thats why their names are
qualied by the name of the owning module. When we import math, use the qualied name math.sqrt().
The syntax of object method functions follows the module function pattern.
Modules and objects are two examples of the principle of encapsulation. There are numerous dierences
between objects and modules, and well look at these more closely when its appropriate. The important
similarity is that both modules and objects are containers of functions. Modules contain functions and
objects contain method functions.
Bottom Line. We want to be able to use method functions starting with Basic Sequential Collections
of Data. Once weve learned how to use method functions,Well show how you create classes and method
functions in Dening New Objects. Well show how you create modules and module functions in Modules :
The unit of software packaging and assembly.
185
def_max(a,_b):
____if_a_>=_b:
________m_=_a
____if_b_>=_a:
________m_=_b
____return m
In other languages (notably Visual Basic), it is common to prex variables with complex codes that indicate
the scope and type of the variable. This is sometimes called Hungarian Notation because theres a kind
of family name given rst.
Because Python is object-oriented, these kinds of prex codes will be inaccurate or incomplete. Also, Python
strives for an English-like look, and short, cryptic prexes interfere with this look. Python parameter names
should be clear, short words that work well as keywords.
Formatting. Blank lines are used sparingly in a Python le, generally to separate unrelated material.
Typically, function denitions are separated by single blank lines. A long or complex function might have
blank lines within the body. When this is the case, it might be worth considering breaking the function into
separate pieces.
The rst line of the body of a function is called a docstring. The recommended forms for docstrings are
described in Python Extension Proposal (PEP) 257.
Typically, the rst line of the docstring is a pithy summary of the function. This may be followed by a blank
line and more detailed information. The one-line summary should be a complete sentence.
def fact( n ):
"""fact( number ) -> number
Returns the number of permutations of n things."""
if n == 0: return 1L
return n*fact(n-1L)
def bico( n, r ):
"""bico( number, number ) -> number
186
Getting Help. The docsting can be retrieved with the help() function.
help(object) string
Prints help on the specic object. For functions, classes or modules, this prints the objects docstring.
For a variable, it prints the value of the variable.
When executing help() while using Python directly (not using IDLE), youll be interacting with a
help viewer that allows you to scroll forward and back through the text.
For more information on the help viewer, see Getting Help.
Heres an example, based on our fact() shown above.
>>> help(fact)
Help on function fact in module __main__:
fact(n)
fact( number ) -> number
Returns the number of permutations of n things.
187
rolldice.py
#!/usr/bin/env python
from __future__ import print_function
import random
def rollDice():
return 1 + random.randrange(6), 1 + random.randrange(6)
d1,d2=rollDice()
print(d1, d2)
This can simplify a number of previous examples. In particular, look at Roll Dice Until Craps in The while
Statement and Counting Sevens for examples that can be simplied by using this rollDice() function.
Important: Debugging
A function that returns multiple values is rather specialized. For now, it can only be used in a multiple
assignment statement. When we learn more about tuples (in Doubles, Triples, Quadruples : The tuple),
well see how we can do a few additional things with these kinds of functions.
The number of variables on the left-hand side of the multiple assignment statement must match the number
of values on the return statement of the function. It helps to emphasize this in the functions docstring, so
that it is perfectly clear how many values the function returns.
188
Since the local namespace of the function is searched rst, names are understood locally. Searching other,
non-local namespaces, is a kind of fall-back plan when the variable is not found in the local namespace.
Generally, we write our functions so that all the variables are either parameters or variables created inside
the function. Rather than burn up brain calories trying to work out the namespace that provides needed
variables, we strive to be sure all names are local.
Nested Functions. Consider the following incomplete script. This doesnt really do much except show an
outline of how programs are often dened as multiple functions. Well look at the three nested contexts from
outermost to innermost.
def rolldice( dice, sides=6 ):
do some work
def average( rolls, dice ):
for i in range(rolls):
r= rolldice( dice, sides )
for i in range(rolls):
r2= rolldice( 2*dice, sides )
rolls=10
sides=8
average( rolls*12, 2 )
1. The main script executes in the global namespace. It denes two functions, rolldice() and average().
Then it denes two global variables, rolls and sides). Finally, it evaluates one of those functions,
average().
2. The average() function has a local namespace, where ve variables are dened. Two of these are
parameter variables: rolls, dice. The rest are ordinary variables i, r, and r2. When average() is called
from the main script, the local rolls will hide the global variable with the same name. The global rolls
is 10, but the local value is 10*12. Can you see why?
The reference to sides is not resolved in the local namespace, but is resolved in the global namespace.
This is called a free variable, and is generally a symptom of poor software design.
3. The rollDice() function (which has its suite of statements omitted) has a local namespace, where
two parameter variables are dened: dice and sides. When rollDice() is called from average(),
there are three nested scopes that dene the environment: the local namespace for rollDice(), the
local namespace for average(), and the global namespace for the main script.
The local variables for rollDice() hide variables declared in other namespaces. The local dice hides
the variable with the same name in average(). The local sides hides the global variable with the same
name.
Functions for Looking At Namespaces. If you evaluate the built-in function globals(), youll see the
mapping that contains all of the global variables Python knows about. For these early programs, all of our
variables are global.
If you evaluate the built-in function locals(), youll see the same variables as you will from globals()
because the top-level Python window interprets your input in the global namespace. However, if you evaluate
the locals() function from within the body of a function, youll be able to see the dierence between local
and global namespaces.
The following example shows the creation of a global variable a, and a global function, q.
>>> a=22.0
>>> globals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None, 'a': 22.0}
>>> def q(x,y):
...
a = x/y
...
print(locals())
189
...
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', 'q': <function q at 0x6feb0>, '__doc__'
>>> globals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', 'q': <function q at 0x6feb0>, '__doc__'
>>> q(22.0,7.0)
{'a': 3.1428571428571428, 'y': 7.0, 'x': 22.0}
1. When we evalate globals() initially, it has some __builtin__ objects, plus our variable a.
2. In our function q(), we print the value of locals() to see whats in the local namespace while q() is
being evaluated.
3. We show the result of locals() and globals(). At the top-level of Python, theyre the same.
4. When we evaluate q(), we see that the locals inside q() are just the parameters.
A built-in function vars() accepts a parameter which is the name of a specic local context: a module,
class, or object. It returns the local variables for that specic context. It turns out that the local variables
are kept in a Python internal object named __dict__. The vars() function retrieves this information.
The function dir() also examines the internal __dict__ object for a specic object; it will locate all local
variables as well as other features of the object.
Assignment statements, as well as def and class statements, create names in the local dictionary. The del
statement removes a name from the local dictionary.
Important: Debugging
There are two big problems people have with namespaces. First, they forget that variables belong to a specic
namespace, and try to use variables as though they exist globally. Some languages (COBOL, original BASIC)
assume that all variables are global. In languages like C and Pascal, it is relatively easy to declare a global
variable. Python tries to avoid the kinds of problems that are caused by the hidden coupling that global
variables cause.
The other problem is failing to include the module name to refer to an imported function denition. When
we say import math, the math module is created with its own namespace, and all the def statements that
are imported execute in math modules namespace. Because of this, we have to say math.sqrt, including
the module name in front of the function name.
This second problem may stem from failing to note what happens with the import statement. If we type a
denition directly at the >>>, it is dened in the global namespace. If, however, we import a module with
the denition, the def statement executes in the modules namespace. Since the denition happens in the
modules namespace, we have to qualify the function name with the module name.
190
It turns out that a function object can be used in three very dierent ways, depending on the context in
which the name occurs.
We can apply the function when we follow the name with ()s.
We can also create an alias for a function by slapping another variable name on the object.
And, we can assign additional attributes to the function, above and beyond the name and the docstring.
Apply The Function. By far, the most common use for a function object is to use ()s to apply the
function to argument values. This is what weve seen in detail throughout this part. This is the ordinary
use for functions.
This explains why a function with no argument values still needs empty ()s. The ()s are the syntax that
tells Python to evaluate the function.
You can think of the ()s as a kind of operator. This () operator applies a function object to the argument
values.
function ( [ [ parameter = ] argument ] , ...
Alias The Function. When we use the name of a function without any ()s, we are not applying the
function to argument values. Were talking about the function; were not asking the function to do anything.
When we leave o the ()s, were making the function into a noun. Its the dierence between talking about
the verb to write and actually writing a note to someone.
One way that we talk about a function is to assign another name to the function. This creates an alias for
the function. This can be dangerous, because it can make a program obscure. However, it can also simplify
the evolution and enhancement of software. We have to cover it because it is a very common technique.
Imagine that the rst version of our program had two functions named rollDie() and rollDice(). The
denitions might look like the following.
rolldice.py First Version
def rollDie():
return random.randrange(1,7)
def rollDice():
return random.randrange(1,7) + random.randrange(1,7)
When we wanted to expand our program to handle ve-dice games, we realized we could generalize this
rollDice() function. Heres our new, slick, expanded function that rolls any number of dice.
def rollNDice( n=2 ):
t= 0
for d in range(n):
t += random.randrange( 1, 7 )
return t
It is important to remove the duplicated algorithm in all three versions of our dice rolling function. Since
the original rollDie() and rollDice() are just special cases of rollNDice(), we should replace them with
something like the following.
rolldice.py Second Version
def rollDie():
return rollNDice( 1 )
191
def rollDice():
return rollNDice()
This revised denition of rollDice() is really just an another name for the rollNDice(). We can see that
our denition of rollDice() doesnt add anything new. Compare it with rollDie(), which supplies an
argument value to the rollNDice() function.
Because a function is an object assigned to a name, we can have multiple names for a function. Heres how
we create an alias to a function.
rollDice = rollNDice
It turns out that evaluating this kind of local function variable is slightly faster than evaluating the qualied
name. This is because the qualication requires Python to lookup the function name in the modules
namespace, an operation that requires a tiny atom of additional time. Consequently, youll see this little
optimization technique in many Python programs.
Get Attributes of the Function. A function object has a number of attributes. We can interrogate
those attributes, and to a limited extend, we can change some of these attributes. For more information,
see section 3.2 of the Python Language Reference [PythonRef] and section 2.3.9.3 of the Python Library
Reference [PythonLib].
__doc__ Docstring from the rst line of the functions body.
__name__ Function name from the def statement.
__module__ Name of the module in which the function name was dened.
func_defaults Tuple with default values to be assigned to each argument that has a default
value. This is a subset of the parameters, starting with the rst parameter that has a default
value.
func_code The actual code object that is the suite of statements in the body of this function.
func_globals The dictionary that denes the global namespace for the module that denes this
function. This is m.__dict__ of the module which dened this function.
func_dict
__dict__ The dictionary that denes the local namespace for the attributes of this function.
You can set and get your own function attributes, also.
def rollDie():
return random.randrange(1,7)
rollDie.version= "1.0"
rollDie.author= "sfl"
192
And yes, the integration exercise is almost calculus. But really, its just the sum of the areas of a bunch of
rectangles, so its inside the box of algebra.
1. Maximum Value of a Function.
Given some integer-valued function f(), we want to know what value of x has the largest value for f()
in some interval of values. For additional insight, see [Dijkstra76].
Imagine we have an integer function of an integer, call it f(). Here are some examples of this kind of
function.
def f1(x): return x
def f2(x): return -5/3*x-3
def f3(x): return -5*x*x+2*x-3
The question we want to answer is what value of x in some xed interval returns the largest value for
the given function? In the case of the rst example, def f1(x): return x, the largest value of f1()
in the interval 0 x < 10 occurs when x is 9.
What about f3() in the range 10 x < 10?
Max of a Function, F, in the interval low to high
(a) Initialize.
x low;
max x;
maxF F(max).
(b) Loop. While low x < high.
i. New Max? If F(x) > maxF :
max x;
maxF F(max).
ii. Next X. Increment x by 1.
(c) Return. Return max as the value at which F(x) had the largest value.
2. Integration.
This is a simple rectangular rule for nding the area under a curve which is continuous on some closed
interval.
We will dene some function which we will integrate, call it f(x)(). Here are some examples.
def f1(x): return x*x
def f2(x): return 0.5 * x * x
def f3(x): return exp( x )
def f4(x): return 5 * sin( x )
When we specify y = f (x), we are specifying two dimensions. The y is given by the functions values.
The x dimension is given by some interval. If you draw the functions curve, you put two limits on the
x axis, this is one set of boundaries. The space between the curve and the y axis is the other boundary.
193
The x axis limits are a and b. We subdivide this interval into s rectangles, the width of each is h = ba
s .
We take the functions value at the corner as the average height of the curve over that interval. If the
interval is small enough, this is reasonably accurate.
Integrate a Function, F, in the interval a to b in s steps
(a) Initialize.
xa
ba
h
s
sum 0.0
(b) Loop. While a x < b.
i. Update Sum. Increment sum by F (x) h.
ii. Next X. Increment x by h .
(c) Return. Return sum as the area under the curve F() for a x < b.
The global statement tells Python that the following names are part of the global namespace, not the local
namespace. The following example shows two functions that share a global variable.
ratePerHour= 45.50
def cost( hours ):
global ratePerHour
return hours * ratePerHour
def laborMaterials( hours, materials ):
return cost(hours) + materials
194
195
196
CHAPTER
EIGHT
197
* List. Flexible Sequences : The list and Common List Design Patterns.
Set. This kind of collection doesnt identify items by position or a key; it simply collects the
items. Collecting Items : The set.
Mapping. This kind of collection identies items by a key value; theres no particular order to
the items.
* Dictionary. Currently, this is the only type of mapping. Mappings : The dict.
File. A le is what we use to make our data structures persistent by writing them to devices like
hard disks or removable USB drives. External Data and Files and Files, Contexts and Patterns
of Processing. Even something as remote-sounding as a le available in the Internet, identied by
its URL, can be used as if it were a simple le. File-Related Library Modules.
Other.
* Exception. The Unexpected : The try and except statements. An exception part of eventdriven programming. These break us out of the strictly sequential mode that our programs
normally use.
* Generator Functions and Iterators. Looping Back : Iterators, the for statement and Generators. This chapter will give us a number of very cool techniques that we can use with the
for statement.
* Function. We started in Organizing Programs with Function Denitions. Well add details in
Dening More Flexible Functions with Mappings.
* Class. Data + Processing = Objects. Since this gets an entire part, not just a chapter; you
can guess that this is a big deal.
* Module. Modules : The unit of software packaging and assembly. Likewise; modules will be
a pretty big deal.
199
200
CHAPTER
NINE
Python has a rich family tree of collections. This part will focus on the sequential collections; Collecting
Items in Sequence will introduce the features that are common to all of the types of sequences.
In Sequences of Characters : str and Unicode we describe the string subclass of sequence. The exercises
include some challenging string manipulations.
We describe xed-length sequences, called tuples in Doubles, Triples, Quadruples : The tuple. Because tuples
are quite simple, they give us an opportunity to digress and introduce some basic kinds of algorithms commonly used for statistical processing. The exercises include Translating From Math To Python: Conjugating
The Verb To Sigma, which describes how to approach writing programs for doing statistical calculations.
In Flexible Sequences : The list we describe the variable-length sequence, called a list. Lists are one of the
cool features that set Python apart from other programming languages. The exercises at the end of the list
section include both simple and relatively sophisticated problems.
Well cover some advanced features of the list in Common List Design Patterns. This chapter includes some
common techniques for creating useful data structures out of the basic tools we have at our disposal. It will
cover the common need to sort a list into order. Well also cover multi-dimensional structures: moving from
mathematical vectors to matrices.
201
0
3.14159
1
'two words'
2
2048
3
(1+2j)
Sequences are used internally by Python. A number of statements and functions we have covered have
sequence-related features. Well revisit a number of functions and statements to add the power of sequences
to them. In particular, the for statement is something we glossed over in The for Statement.
The idea that a for statement processes items in a particular order, and a sequence stores items in order is
an important connection. As we learn more about these data structures, well see that the processing and
the data are almost inseparable.
It turns out that the range() function that we introduced generates a sequence object. You can see this
object when you do the following:
>>>
[0,
>>>
[1,
>>>
[2,
range(6)
1, 2, 3, 4, 5]
range(1,7)
2, 3, 4, 5, 6]
range(2,36,3)
5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35]
Well look at the range() function and how it generates list objects in detail in Flexible Sequences : The
list.
202
Compute Mean. The mean is the sum of the samples divided by the count of the samples. The sum
is a reduction from the collection of outcomes, as is the count.
To compute the sum and the count, we must have a collection of individual results from playing
Roulette.
Create Sample Collection. To create the samples, we have to simulate our betting strategy enough
times to have meaningful statistics. Well use an iteration to create a collection of 100 individual
outcomes of playing our strategy. Each outcome is the result of one session of playing Roulette.
In order to collect 100 outcomes, well need to create each individual outcome. Each outcome is based
on placing and resolving bets.
Resolve Bets. We apply the rules of Roulette to determine if the bet was a winner (and how
much it won) or if the bet was a loser.
Before we can resolve a bet, have to spin the wheel. And before we spin the whell, we have to
place a bet.
Spin Wheel. We generate a random result. We increase the number of spins weve played.
In order for the spin to have any meaning, of course, well need to have some bets placed.
Place Bets. We use our betting strategy to determine what bet we will make and how much we
will bet. For example, in the Martingale system, we bet on just one color. We double our bet
when we lose and reset our bet to one unit when we win. Note that there are table limits, also,
that will limit the largest bet we can place.
When we reverse these steps, we have a very typical program that creates a sequence of samples and analyzes
that sequence of samples.
Other typical forms for programs may include reading a sequence of data items from les, something well
turn to in later chapters. Some programs may be part of a web application, and process sequences that come
from user input on a web form.
203
objects do have some common kinds of features. In the next section, well look at all of the features that are
common among these sequence subspecies.
A great deal of Pythons internals are sequence-based. Here are just a few examples:
The for statement, in particular, expects a sequence, and we often create a list with the range()
function.
When we split a str using the split() method, we get a list of substrings.
When we dene a function, we can have positional parameters collected into a sequence, something
well cover in Mappings : The dict.
index
start
end
This identies a subsequence of items with positions from start to end -1. This creates a new sequence
which is a slice of the original sequence; there will be end - start items in the resulting sequence.
Items are identied by their position numbers. The position numbers start with zero at the beginning of the
sequence.
Important: Numbering From Zero
204
Newbies are often tripped up because items in a sequence are numbered from zero. This leads to a small
disconnect between or cardinal numbers and ordinal names.
The ordinal names are words like rst, second and third. The cardinal numbers used for these positions
are 0, 1 and 2. We have two choices to try and reconcile these two identiers:
Remember that the ordinal names are always one too big. The third item is in position 2.
Try to use the word zeroth (or zeroeth) for the item in position 0.
In this book, well use conventional ordinal names starting with rst, and emphasize that this is position
0 in the sequence.
Positions are also numbered from the end of the sequence as well as the beginning. Position -1 is the last
item of the sequence, -2 is the next-to-last item.
Important: Numbering In Reverse
Experienced programmers are often tripped up because Python identies items in a sequence from the right
using negative numbers, as well as from the left using positive numbers. This means that each item in a
sequence actually has two numeric indexes.
Heres a depiction of a sequence of four items. Each item has a position that identies the item in the
sequence. Well also show the reverse position numbers.
forward position
reverse position
item
0
-4
3.14159
1
-3
'two words'
2
-2
2048
3
-1
(1+2j)
Why do we have two dierent ways of identifying each position in the sequence? If you want, you can think
of it as a handy short-hand. The last item in any sequence, S can be identied by the formula S[ len(S)-1
]. For example, if we have a sequence with 4 items, the last item is in position 3. Rather than write S[
len(S)-1 ], Python lets us simplify this to S[-1].
Factory Functions. There are also built-in factory (or conversion) functions for the sequence objects.
These are ways to create sequences from other kinds of data.
str(object) string
Creates a string from the object. This provides a human-friendly string representation of really complex
objects. There is another string factory function, repr, which creates a Python-friendly representation
of an object. Well return to this in Sequences of Characters : str and Unicode.
unicode(object) unicode
Creates a Unicode string from the object.
list(sequence) list
Return a new list whose items are the same as those of the argument sequence. Generally, this is
used to convert immutable tuples to mutable lists.
tuple(sequence) tuple
Return a new tuple whose items are the same as those of the argument sequence. If the argument
is a tuple, the return value is the same object. Generally, this is used to convert mutable lists into
immutable tuples.
Accesssor Functions. There are several built-in accessor functions which return information about a
sequence.
These functions apply to all varieties of lists, strings and tuples.
min(iterable) item
Return the item which is least in the iterable (sequence, set or mapping).
9.1. Collecting Items in Sequence
205
max(iterable) item
Return the item which is greatest in the iterable (sequence, set or mapping).
len(iterable) number
Return the number of items in the iterable (sequence, set or mapping).
These functions hint at a generalization. A sequence, it turns out, is a kind of iterable object. These functions
apply to any iterable. Well look at this generalization in Looping Back : Iterators, the for statement and
Generators.
Aggregation Functions. The following functions create an aggregate value from a sequence of values. In
the case of sum() it must be a sequence of numbers. In the case of any() and all(), it must be sequence
of boolean values.
Applying any() or all() to a string is silly and always returns True.
Why? [Hint: do bool('a') to see what an individual characters truth value is.]
Similarly, applying sum() to a sequence that isnt all numbers raises a TypeError exception.
sum(iterable) number
Sum the values in the iterable (set, sequence, mapping). All of the values must be numeric.
all(iterable) boolean
Return True if all values in the iterable (set, sequence, mapping) are equivalent to True.
any(iterable) boolean
Return True if any value in the iterable (set, sequence, mapping) is equivalent to True.
206
0
s
1
y
2
n
3
c
4
o
5
p
6
a
7
t
8
e
9
d
We get string objects from external devices like the keyboard, les or the network. We present strings to
users either as les or on the GUI display. The print statement converts data to a string before showing it
to the user. This means that printing a number really involves converting the number to a string of digits
before printing the string of digit characters.
Often, our program will need to examine input strings to be sure they are valid. We may be checking a
string to see if it is a legal name for a day of the week. Or, we may do a more complex examination to
conrm that it is a valid time. There are a number of validations we may have to perform.
Our computations may involve numbers derived from input strings. Consequently, we may have to convert
input strings to numbers or convert numbers to strings for presentation.
207
a A simple string.
apos A string using ". It has an ' inside it.
quote A string using '. It has two " inside it.
doc_1 This a six-line string.
Use repr(doc_1) to see how many lines it has. Better, use doc_1.splitlines().
novel This is a one-line string with both " and ' inside it.
Non-Printing Characters Really! [How can it be a character and not have a printed representation?]
ASCII has a few dozen characters that are intended to control devices or adjust spacing on a printed
document.
There are a few commonly-used non-printing characters: mostly tab and newline. One of the most common
escapes is \n which represents the non-printing newline character that appears at the end of every line of
a le in GNU/Linux or MacOS. Windows, often, will use a two character end-of-line sequence encoded as
\r\c. Most of our editing tools quietly use either line-ending sequence.
208
These non-printing characters are created using escapes. A table of escapes is provided below. Normally,
the Python compiler translates the escape into the appropriate non-printing character.
Here are a couple of literal strings with a \n character to encode a line break in the middle of the string.
'The first message.\nFollowed by another message.'
"postmarked forestland\nconfigures longitudes."
Python supports a broad selection of \ escapes. These are printed representations for unprintable ASCII
characters. Theyre called escapes because the \ is an escape from the usual meaning of the following
character. We have very little use for most of these ASCII escapes. The newline (\n), backslash (\),
apostrophe (') and quote (") escapes are handy to have.
Important: Escapes Become Single Characters
We type two (or more) characters to create an escape, but Python compiles this into a single character in
our program.
In the most common case, we type \n and Python translates this into a single ASCII character that doesnt
exist on our keyboard.
Since \ is always the rst of two (or more) characters, what if we want a plain-old \ as the single resulting
character? How do we stop this escape business?
The answer is we dont. When we type \\, Python puts a single \ in our program. Okay, its clunky, but
its a character that isnt used all that often. The few times we need it, we can cope. Further, Python has
a raw mode that permits us to bypass these escapes.
Escape
\\
\'
\
\a
\b
\f
\n
\r
\t
\ooo
\xhh
Meaning
Backslash (\)
Apostrophe (\ ')
Quote (")
Audible Signal; the ASCII code called BEL. Some OSs translate this to a screen ash or ignore
it completely.
Backspace (ASCII BS)
Formfeed (ASCII FF). On a paper-based printer, this would move to the top of the next page.
Linefeed (ASCII LF), also known as newline. This would move the paper up one line.
Carriage Return (ASCII CR). On a paper based printer, this returned the print carriage to the
start of the line.
Horizontal Tab (ASCII TAB)
An ASCII character with the given octal value. The ooo is any octal number.
An ASCII character with the given hexadecimal value. The x is required. The hh is any hex
number.
We can also use a \ at the end of a line, which means that the end-of-line is ignored. The string continues
on the next line, skipping over the line break. Heres an example of a single string that was so long had to
break it into multiple lines.
"A manuscript so long \
that it takes more than one \
line to finish it."
Why would we have this special dangling-backslash? Compare the previous example with the following.
209
Whats the dierence? Enter them both into IDLE to see what Python displays. One string represent a
single line of data, where the other string represents three lines of data. Since the \ escapes the meaning of
the newline character, it vanishes from the string. This gives us a very ne degree of control over how our
output looks.
Also note that adjacent strings are automatically put together to make a longer string. We wont make much
use of this, but it something that you may encounter when reading someone elses programs.
"syn" "opti" "cal" is the same as "synoptical".
Unicode Strings. If a u or U is put in front of the string (for example, u"unicode"), this indicates a Unicode
string. Without the u, it is an ASCII string. Unicode refers to the Universal Character Set; each character
requires from 1 to 4 bytes of storage. ASCII is a single-byte character set; each of the 256 ASCII characters
requires a single byte of storage. Unicode permits any character in any of the languages in common use
around the world.
For the thousands of Unicode characters that are not on our computer keyboards, a special \uxxxx escape
is provided. This requires the four digit Unicode character identication. For example, is made up
of Unicode characters U+65e5 and U+672c. In Python, we write this string as u'\u65e5\u672c'.
Heres an example that shows the internal representation and the easy-to-read output of this string. This
will work nicely if you have an appropriate Unicode font installed on your computer. If this doesnt work,
youll need to do an operating system upgrade to get Unicode support.
>>> ch= u'\u65e5\u672c'
>>> ch
u'\u65e5\u672c'
>>> print(ch)
Its very important to note that Unicode characters are encoded into a sequence of bytes when they are
written to a le. A sequence of bytes read from a le can be decoded to get the Unicode characters.
Once inside the computers memory, in a Python program, theres no encoding. Just characters.
There are a variety of Unicode encoding schemes. The choice of encoding is based on assumptions about
the typical number of bytes for a character. For example, the UTF-16 codes are most ecient when most
of characters actually use two bytes and there are relatively few exceptions. The UTF-8 codes, on the other
hand, work well on the internet where many of the protocols expect only the US ASCII characters.
For the most part, we can use the io module to control opening and closing les with specic encodings.
In the rare event that we need really ne control over the encoding, the codecs module provides mechanisms
for encoding and decoding Unicode strings.
See http://www.unicode.org for more information.
Raw Strings. If an r or R is put in front of the string (for example, r"raw\nstring"), this indicates a raw
string. This is a string where the backslash characters (\) are not interpreted by the Python compiler but
are left as is. This is handy for Windows les names, which contain \. It is also handy for regular expressions
that make heavy use of backslashes. Well look at these in Text Processing and Pattern Matching : The re
Module.
"\n" is an escape thats converted to a single unprintable newline character.
r"\n" is two characters, \ and n .
210
The repr() function also converts an object to a string. However, repr() creates a string suitable for use
as Python source code. For simple numeric types, its not terribly interesting. For more complex, types,
however, it reveals details of their structure.
Important: Python 3
In Python 2, the repr() function can also be invoked using the backtick (`), also called accent grave.
This ` syntax is not used much and will be removed from Python 3.
Here are several version of a very long string, showing a number of representations.
1
2
3
4
5
6
7
8
9
211
10
11
long symbolizer
on multiple lines
The above example shows the UTF-8 encoding for as a string of bytes and as a Python Unicode string.
The Unicode string character numbers (u65e5 and u672c) are easier to read as a Unicode string than they
are in the UTF-8 encoding.
The * Operator. The * operator between strings and numbers (number * string or string * number) creates
a new string that is a number of repetitions of the argument string.
>>> print(2*"way " + "cool!")
way way cool!
The [] operator. The [] operator can extract a single character or a substring from the string. There are
two forms for picking items or slices from a string.
This form extracts a single item.
212
string[index]
Items are numbered from 0 to len(string)-1. Items are also numbered in reverse from -len(string) to
-1.
This extracts a slice, creating a sequence from a sequence.
string[start:end]
Characters from start to end-1 are chosen to create a new string as a slice of the original string; there will
be end - start characters in the resulting string. If start is omitted it is the beginning of the string (position
0), if end is omitted it is the end of the string (position -1).
For more information on how the numbering works for the [] operator, see Numbering from Zero.
Important: The meaning of []
Note that the [] characters are part of the syntax.
We use [ and ] for optional elements. This is not part of the syntax, but a description of optional syntactic
elements. This can lead to confusion because there are two meanings for [] characters.
Since most technical documentation uses [ and ] for optional elements, weve elected to stick with that rather
than try to adopt something more clear, but atypical.
Here are some examples of picking out individual items or creating a slice composed of several items.
>>> s="artichokes"
>>> s[2]
't'
>>> s[:5]
'artic'
>>> s[5:]
'hokes'
>>> s[2:3]
't'
>>> s[2:2]
''
The last example, s[2:2], shows an empty slice. Since the slice is from position 2 to position 2-1, there
cant be any characters in that range; its a kind of contradiction to ask for characters 2 through 1. Python
politely returns an empty string, which is a sensible response to the expression.
Recall that string positions are also numbered from right to left using negative numbers. s[-2] is the nextto-last character. We can, then, say things like the following to work from the right-hand side instead of the
left-hand side.
>>> s="artichokes"
>>> s[-2]
'e'
>>> s[-3:-1]
'ke'
>>> s[-1:1]
''
213
The template string is "Today's temp is %dC (%dF)". The two values are (3, 37.39). You can see that
the values were used to replace the %d conversion specication.
Our template string, then, was really in ve parts:
1. Today's temp is is literal text, and appears in the result string.
2. %d is a conversion specication; it is replaced with the string conversion of 3. Okay, it seems kind of
silly, but 3 in Python is a number, not a string, and it has to be converted to a string. The print()
function does this automatically. Also, when we work in the IDLE Python Shell, IDLE does this kind
of string conversion automatically, also. Weve been spoiled.
3. C ( is literal text, and appears in the result string.
4. %d is a conversion specication; it is replaced with the string conversion of 37.49. While it isnt obvious
what happened, heres a hint: the %d specication produces decimal integers. To produce an integer
from a oating-point number, two conversions had to happen.
5. F) is literal text, and appears in the result string.
For details, see the Python Library Reference.
http://docs.python.org/release/2.6/library/stdtypes.html#string-formatting-operations
Were going to focus on the str.format() method. Well cover that in format() : The Format Method
214
that belongs to a Unicode number. ord() transforms an ASCII character to its ASCII code number, or
transforms a Unicode character to its Unicode number.
len(iterable) integer
Return the number of items of a set, sequence or mapping.
>>> len("restudying")
10
>>> len(r"\n")
2
>>> len("\n")
1
Note that a raw string (r"\n") doesnt use escapes; this is two characters. An ordinary string ("\n")
interprets the escapes; this is one unprintable character.
chr(i) character
Return a string of one character with ordinal i; 0 i < 256.
This is the standard US ASCII conversion, chr(65) == 'A'.
ord(character) integer
Return the integer ordinal of a one character string. For an ordinary character, this will be the US
ASCII code. ord('A') == 65.
For a Unicode character this will be the Unicode number. ord(u'\u65e5') == 26085.
unichr(i) Unicode string
Return a Unicode string of one character with ordinal i; 0 i < 65536. This is the Unicode mapping,
dened in http://www.unicode.org/.
>>> unichr(26085)
u'\u65e5'
>>> print(unichr(26085))
>>> ord(u'\u65e5')
26085`
Note that min() and max() also apply to strings. The min() function will return the character closest that
front of the alphabet. The max() function returns the character closest to the back of the alphabet.
>>> max('restudying')
'y'
>>> min('restudying')
'd'
215
Numbers arent interpreted numerically, but as a string of characters; consequently '11' comes before
'2'. Why? Compare the two strings, position-by-position: the rst character, '1', comes before '2'.
They may look like numbers to you; but theyre strings to Python.
Here are some examples.
>>> 'hello' < 'world'
True
>>> 'inordinate' > 'in'
True
>>> '1' < '11'
True
>>> '2' < '11'
False
These rules for alphabetical order are much simpler than, for example, the American Library Association
Filing Rules. Those rules are quite complex and have a number of exceptions and special cases.
There are two additional string comparisons: in and not in. These check to see if a single character string
occurs in a longer string. The in operator returns a True when the character is found in the string, False
if the character is not found. The not in operator returns True if the character is not found in the string.
>>> "i" in 'microquake'
True
>>> "i" in 'formulates'
False
When in doubt, break down the *= operator to its component parts. It helps to think of the statment like
this: value = value * 'hello'.
The for Statement. Since a string is a sequence, the for statement will visit each character of the string.
for c in "lobstering":
print(c)
The print Statement. The print must convert each expression to a string before writing the strings to
the standard output le. We prefer, however, to use the print() function.
216
217
str.lower() string
Return a copy of the original string converted to lowercase.
"SuperLight".lower() creates 'superlight'.
str.lstrip() string
Return a copy of the original string with leading whitespace removed. This is often used to clean up
input.
" precasting \n".lstrip() creates 'precasting \n'.
str.replace(old, new [, count ]) string
Return a copy of the original string with all occurrences of substring old replaced by new. If the
optional argument count is given, only the rst count occurrences are replaced.
The most common use is "$HOME/some/place".replace("$HOME","e:/book") replaces the "$HOME"
string to create a new string 'e:/book/some/place'.
Once in a while, well need to replace just the rst occurance of some target string, allowing us to do
something like the following: 'e:/book/some/place'.replace( 'e', 'f', 1 ).
str.rjust(width) string
Return a copy of the original string right justied in a string of length width. Padding is done using
spaces on the left.
"fulminates".rjust(15) creates :' fulminates'.
With more visible spaces, this is
'fulminates'
str.rstrip() string
Return a copy of the original string with trailing whitespace removed. This has an obvious symmetry
with lstrip().
" precasting \\n".rstrip() creates ' precasting'.
str.strip() string
Return a copy of the original string with leading and trailing whitespace removed. This combines
lstrip() and rstrip() into one handy package.
" precasting \n".strip() creates 'precasting'.
str.swapcase() string
Return a copy of the original string with uppercase characters converted to lowercase and vice versa.
218
str.title() string
Return a titlecased version of the original string. Words start with uppercase characters, all remaining
cased characters are lowercase.
For example, "hello world".title() creates 'Hello World'.
str.upper() string
Return a copy of the original string converted to uppercase.
Accessors. The following methods provide information about a string.
class str
str.count(sub[, start, end ]) integer
Return the number of occurrences of substring sub in a string. If the optional arguments start and end
are given, they are interpreted as if you had said string [ start : end ].
For example "hello world".count("l") is 3.
str.endswith(sux [, start, end ]) boolean
Return True if the string ends with the specied sux, otherwise return False. With optional start,
or end, the test is applied to string [ start : end ].
"pleonastic".endswith("tic") creates True.
str.find(sub[, start, end ]) integer
Return the lowest index in the string where substring sub is found. If optional arguments start and
end are given, than string [ start : end ] is searched. Return -1 on failure.
"rediscount".find("disc") returns 2; "postlaunch".find("not") returns -1.
str.index(sub) integer
Like find() but raise ValueError when the substring is not found.
See The Unexpected : The try and except statements for more information on processing exceptions.
str.isalnum() boolean
Return True if all characters in the string are alphanumeric (a mixture of letters and numbers) and
there is at least one character in the string. Return False otherwise.
str.isalpha() boolean
Return True if all characters in the string are alphabetic and there is at least one character in the
string. Return False otherwise.
str.isdigit() boolean
Return True if all characters in the string are decimal digits and there is at least one character in the
string, False otherwise.
str.islower() boolean
Return True if all characters in the string are lowercase and there is at least one cased character in the
string, False otherwise.
str.isspace() boolean
Return True if all characters in the string are whitespace and there is at least one character in the
string, False otherwise. Whitespace characters includes spaces, tabs, newlines and a handful of other
non-printing ASCII characters.
str.istitle() boolean
Return True if the string is a titlecased string, i.e. uppercase characters may only follow uncased
characters and lowercase characters only cased ones, False otherwise.
219
str.isupper() boolean
Return True if all characters in the string are uppercase and there is at least one cased character in
the string, False otherwise.
str.rfind(sub[, start, end ]) integer
Return the highest index in the string where substring sub is found. Since this is the highest index,
this looking for the right-most occurrence, hence the r in the name. If optional arguments start and
end are provided, then string [ start : end ] is searched. Return -1 on failure to nd the requested
substring.
str.rindex(sub) integer
Like rfind() but raise ValueError when the substring is not found.
str.startswith(prex [, start, end ]) boolean
Return True if the string starts with the specied prex, otherwise return False. With optional start,
or end, test string [ start : end ].
"E:/programming".startswith("E:") is True.
Parsers. The following methods create another kind of object, usually a sequence, from a string.
class str
str.split(sep[, maxsplit ]) sequence
Return a list of the words in the string the string, using sep as the delimiter string. If maxsplit is given,
at most maxsplit splits are done. If sep is not specied, any whitespace string is a separator.
We can use this to do things like aList= "a,b,c,d".split(','). Well look at the resulting sequence
object closely in Flexible Sequences : The list.
str.splitlines(keepends) sequence
Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the
resulting list unless keepends is given and True. This method can help us process a le: a le can be
looked at as if it were a giant string punctuated by \n characters.
We can break up a string into individual lines using statements like lines= "two lines\nof
data".splitlines().
str.partition(punctuation) tuple
Locate the left-most occurance of punctuation. If found, split the string into three parts. The part
before, the punctuation that was found and the part after.
If the punctuation was not found, then the last two elements are zero-length strings.
>>> first, punct, last = "label :: several :: values".partition( "::" )
>>> first
'label '
>>> punct
'::'
>>> last
' several :: values'
str.rpartition(punctuation) tuple
Similar to str.partition(), except it search for the right-most occurence of the punctuation.
Heres another example of using some of the string methods and slicing operations.
220
temperature.py
1
2
3
4
5
6
7
8
9
10
11
12
13
2. The str.isdigit() method tells us if the string is all digits, or contains some extra characters. If the
input string ends with C or F, well handle this small typing mistake gracefully.
5. This is the standard break a string at a position pattern. In this case, we are breaking at the last
position of the string. The nal character will be assigned to the unit variable, which we expect to be
C or F.
7. We use the str.upper() method to create a new string which is only uppercase letters. In the long run,
this is simpler and more reliable than messing around with unit.startswith(C) or unit.startswith(c).
8. We use the str.startswith() method to examine the rst part of the users input. This will allow
the user to spell out Celsius or Fahrenheit.
The template string is "Today's temp is {0:d}C ({1:.2f}F)". The two values are (3, 37.39). You can
see that the values were used to replace the {0:d} and {1:.2f} conversion specications.
Our template string, then, was really in ve parts:
1. Today's temp is is literal text, and appears in the result string.
2. {0:d} is a conversion specication; it is replaced with the string conversion of 3. Okay, it seems kind
of silly, but 3 in Python is a number, not a string, and it has to be converted to a string.
3. C ( is literal text, and appears in the result string.
4. {1:.2f} is a conversion specication; it is replaced with the string conversion of 37.39.
221
{field_name}
{field_name:format}
The mandatory eld_name specices which piece of data is taken from the arguments.
The optional format species how that piece of data should be formatted. If theres no command::format
then the object is converted to a string using default formatting rules.
The {} and : are part of the syntax.
In the {0:d} example, the eld_name is 0 (the rst argument value). The format is d.
In the {1:.2f} example, the eld_name is 1 (the second argument value). The format is .2f.
This is just an overview of the most important parts. Weve left quite a bit out.
Format Features Each format actually has a fairly large number of optional features. The full format has
seven parts. All of these are optional.
[fill][sign][options][width][.precision][type]
The ll takes one or two characters. Its one of the four alignment characters with an optional prex. This
leads to 8 possibilities.
<, fill<. Align to the left. Fill any extra positions on the right with spaces or the ll character.
>, fill>. Align to the right. Fill any extra positions on the left with spaces or the ll character.
Using *> will prepend * to a number.
^, fill^. Center in the available space. Full extra positions on left and right with spaces or the ll
character.
=, fill=. Put the padding between sign and digits. The sign is specied separately, and its common
to use both ll and sign. For example, =+ to explicitly show the sign followed by spaces. Another
common use is 0=+ to show a sign followed by leading zeroes.
The sign is one character. + shows all signs. - shows only negative signs. A space uses a space for positive
and a sign for negative.
One of the options characters is a #. If present, then a prex (0b, 0o, 0x) is used for binary, octal or
hexadecimal conversions.
The other options character is a 0. If present, leading zeroes are padded. This is the same as a 0= ll
specication. In eect, it makes the = optional.
The width is the overall number of positions into which the number is converted. The default is to left-align
with trailing spaces. The various ll and sign options, however, provide a great deal of control over how the
number is t into the available width.
222
The .precision is the number of decimal places to include. The . is required to show that this is the precision.
. clearly separates precision from width.
The type is the kind of data conversion to apply. There are two broad categories of conversion: integer and
oating-point.
The most common integer conversion codes are d and n. The d conversion is ordinary decimal numbers.
Additional integer conversions include d, o, x and X for binary, octal and hexadecimal.
The common oat conversions codes are e, E, f, g and G. The e and E conversions give scientic notation
(3.739000e+01). The f conversion gives ordinary-looking numbers. The g and G conversions choose between
f and e formatting. An additional oat conversion is % which multplies by 100 to provide a good-looking
percentage value.
Also, theres an n conversion for localized numbers with proper , or . separators and decimal points.
Examples. Here are some examples of messages with more complex templates.
"{0}: {1} win, {2} loss, {3:6.3f}".format(count,win,loss,float(win)/loss)
This example does four conversions: three simple integer and one oating-point that provides a width of 6
and 3 digits of precision. -0.000 is the expected format. The rest of the string is literally included in the
output.
"Spin {0:>3d}: {1:>2d}, {2}".format(spin,number,color)
This example does three conversions: one number is converted into a right-aligned eld with a width of 3,
another converted with a width of 2, and a string is converted, using as much space as the string requires.
"Win rate: {0:.1%}".format( win/float(spins) )
223
string.octdigits 01234567
string.printable All printable characters in the character set
string.punctuation All punctuation in the
!"#$%&'()*+,-./:;<=>?@[\]^_\`{|}~
character
set.
For
ASCII,
this
is
Some programmers who have extensive experience in other languages will ask if creating a new string from
the original strings is the most ecient way to accomplish this. Or they suggest that it would be simpler to
allow mutable strings for this kind of concatenation. The short answer is that Pythons storage management
makes this use if immutable strings the simplest and most ecient. Well discuss this in some depth in
Sequence FAQs.
Removing Characters From A String. Sometimes we want to remove some characters from a string.
Python encourages us to create a new string that is built from pieces of the original string. For example:
>>> s="black,thorn"
>>> s = s[:5] + s[6:]
>>> s
'blackthorn'
In this example, we dropped the sixth character (in position 5), ,. Recall that the positions are numbered
from zero. Positions 0, 1 and 2 are the rst three characters. Position 5 is the sixth character. Heres how
this example works.
224
1. Create a slice of s using characters up to the fth. This is positions 0 through 4, a total of ve
characters.
2. Create a slice of s using characters starting from position 6 (the seventh character) through the end of
the string.
3. Assemble a new string from these two slices; the sixth character (position 5) will have been ignored
when we created the two slices.
In other languages, there are sophisticated methods to delete particular characters from a string. Again,
Python makes this simpler by letting us create a new string from pieces of the old string.
Breaking a String at a Fixed Position. Often, we will break a string into pieces based on a xed format.
Python gives us a very handy way to do this.
>>> fn="1985 Mar 19"
>>> year= fn[:4]
>>> month= fn[5:8]
>>> day= fn[-2:]
>>> month
'Mar'
>>> day
'19'
Breaking a String at a Punctuation Mark. There are numerous variations on the parsing theme. Well
look at just one: locating a punctuation mark to split a string.
>>> prop="name : value which has : in it"
>>> label, _, value = prop.partition( ":" )
>>> label.rstrip()
'name'
>>> value.lstrip()
'value which has : in it'
In this example, we assigned the punctuation mark to the variable _. This variable is sometimes used as a
dont care variable. We know that str.partition() always provides three values, but we only want two
of them.
225
this long set of comparisons down to a shorter expression that we can evaluate in a loop. We can use
w[0] <= w[1], and w[1] <= w[2] to examine each letter and its successor.
Write a loop to examine each character to determine if the letters of the word occur in alphabetical
order. Words like abhorrent or immortals have the letters in alphabetical order.
2. Roman Numerals.
This is similar to translating numbers to English. Instead we will translate them to Roman Numerals.
The Algorithm is similar to Check Amount Writing (above). You will pick o successive digits, using
amount%10 and amount/10 to gather the digits from right to left.
The rules for Roman Numerals involve using four pairs of symbols for ones and ve, tens and fties,
hundreds and ve hundreds. An additional symbol for thousands covers all the relevant bases.
When a number is followed by the same or smaller number, it means addition. II is two 1s = 2.
VI is 5 + 1 = 6.
When one number is followed by a larger number, it means subtraction. IX is 1 before 10 = 9. IIX
isnt allowed, this would be VIII.
For numbers from 1 to 9, the symbols are I and V, and the coding works like this.
(a) I
(b) II
(c) III
(d) IV
(e) V
(f) VI
(g) VII
(h) VIII
(i) IX
The same rules work for numbers from 10 to 90, using X and L. For numbers from 100 to 900,
using the symbols C and D. For numbers between 1000 and 4000, using M.
Here are some examples. 1994 = MCMXCIV, 1956 = MCMLVI, 3888= MMMDCCCLXXXVIII
3. Word Lengths.
Analyze the following block of text. Youll want to break into into words on whitespace boundaries.
Then youll need to discard all punctuation from before, after or within a word.
Whats left will be a sequence of words composed of ASCII letters. Compute the length of each word,
and produce the sequence of digits. (no word is 10 or more letters long.)
Compare the sequence of word lenghts with the value of math.pi.
Poe, E.
Near a Raven
Midnights so dreary, tired and weary,
Silently pondering volumes extolling all by-now obsolete lore.
During my rather long nap - the weirdest tap!
An ominous vibrating sound disturbing my chamber's antedoor.
"This", I whispered quietly, "I ignore".
0
51
1
0
2
153
Immutability of Tuples. When someone asks about changing a tuple, we have to remind them that the
list, in Flexible Sequences : The list, is for dynamic sequences of items. A tuple is generally used when the
number of items is xed by the nature of the problem. For example, 2-dimensional geometry, or a 4-part
internet address, or a Cyan-Mangenta-Yellow-Black color code. Using a tuple, with a xed number of items,
saves Python from all of the bookkeeping necessary when there is a dynamic number of items.
227
Another common use for tuples is to create a function that returns multiple values. When we put multiple
values in a return statement, we are creating a tuple. An example would be a function that simulates rolling
two dice and returns a tuple with two dice values.
xy A typical 2-tuple.
personal A 3-tuple with name and two numbers.
singleton A 1-tuple. The , is mandatory. Without the ,, this is just an expression in ().
zero_tuple A way to specify a tuple with no actual data in it.
p2 A 3-tuple with a string, another 3-tuple ((3,8,85)) and a Unicode string. The extra , at
the end is quietly ignored.
Important: But Wait!
But wait! you say. The () characters are used to identify parts of an expression. And the identify the
argument values to a function. How can they also be used to dene a new tuple object?
In the case of (), the context helps Python determine how to interpret these characters.
When you have something like a(b), this is a function application.
When you have (b) by itself, this is an expression.
When there is at least one , (as in (a,b) or (a,)), this is a tuple.
If we say just (), this is a tuple with zero items. Its a strange degenerate case, but might be useful as
a placeholder in a complex data object.
A pleasant consequence of this is that an extra comma at the end of a tuple is legal; for example, (9, 10,
56, ) is still a three-tuple.
228
>>> tuple()
()
>>> tuple( "hi mom" )
('h', 'i', ' ', 'm', 'o', 'm')
In the second example, a string, which is a kind of sequence, is transformed into a tuple of individual
characters.
The * operator. The * operator between tuples and numbers (number * tuple or tuple * number) creates
a new tuple that is a number of repetitions of the input tuple.
>>> 2*(3,"blind","mice")
(3, 'blind', 'mice', 3, 'blind', 'mice')
The [] operator. The [] operator selects an item or a slice from the tuple. There are two forms for picking
items or slices from a tuple.
This form extracts a single item.
tuple[index]
Items are numbered from 0 to len(tuple)-1. Items are also numbered in reverse from -len(tuple) to -1.
This extracts a slice, creating a new sequence from a sequence.
tuple[start:end]
Items from start to end-1 are chosen to create a new tuple as a slice of the original tuple; there will be end
- start items in the resulting tuple. If start is omitted it is the beginning of the tuple (position 0), if end is
omitted it is the end of the tuple (position -1).
For more information on how the numbering works for the [] operator, see Numbering from Zero.
Here are some examples of selecting items or slices from a larger 5-tuple.
>>> t=( (2,3), (2,"hi"), (3,"mom"), 2+3j, 6.02E23 )
>>> t[2]
(3, 'mom')
>>> print( t[:3], 'and', t[3:] )
((2, 3), (2, 'hi'), (3, 'mom')) and ((2+3j), 6.02e+23)
>>> print(t[-1], 'then', t[-3:])
6.02e+23 then ((3, 'mom'), (2+3j), 6.02e+23)
The % Operator. The string format operator works between string and tuple.
str.format(), however.
9.3. Doubles, Triples, Quadruples : The tuple
We prefer to use
229
max(iterable) value
Returns the largest value in the iterable (sequence, set or mapping).
>>> stats = ( (5,'zero'), (43,'red'), (52, 'black') )
>>> max( stats )
(52, 'black')
min(sequence) value
Returns the smallest value in the iterable (sequence, set or mapping).
>>> stats = ( (5,'zero'), (43,'red'), (52, 'black') )
>>> min( stats )
(5, 'zero')
Some other functions which apply to sequences in general are available. However, they dont much much
sense for tuples. The iteration functions, like enumerate(), sorted(), reversed() and zip() are valid, but
arent very meaningful.
Aggregation Functions. The following functions create an aggregate value from a tuple.
sum(iterable) number
Sum the values in the iterable (set, sequence, mapping). All of the values must be numeric.
>>> sum( ( 1, 3, 5, 7, 9 ) )
25
all(iterable) boolean
Return True if all values in the iterable (set, sequence, mapping) are equivalent to True.
>>> compare_1 = ( 2<=3, 5<7, 22%2 == 0 )
>>> all( compare_1 )
True
>>> compare_2 = ( 2 > 3, 5<7, 22%2 == 0 )
>>> all( compare_2 )
False
>>> compare_2
(False, True, True)
any(iterable) boolean
Return True if any value in the iterable (set, sequence, mapping) is equivalent to True.
>>> roll = 7
>>> any( (roll == 7, roll == 11) )
True
>>> any( (roll == 2, roll == 3, roll == 12) )
False
230
This will create a random number, setting aside the zero and double zero. If the number is in the tuple of
red spaces on the Roulette layout, this is printed. If none of the other rules are true, the number is in one
of the black spaces.
An essential ingredient here is that a tuple has a xed and known number of items. For example a 2dimensional geometric point might have a tuple with x and y. A four-part color code might be a tuple with
c, m, y and b.
This works well because the right side of the assignment statement is fully evaluated before the assignments
are performed. This allows things like swapping two variables with x,y=y,x.
9.3. Doubles, Triples, Quadruples : The tuple
231
The for Statement. The for statement also works directly with sequences like tuples. The range()
function that we have used creates a kind of sequence called a list. A tuple is also a kind of sequence and
can be used in a for statement.
s= 0
for i in ( 1,3,5,7,9, 12,14,16,18, 19,21,23,25,27, 30,32,34,36 ):
s += i
print("total", s)
f (i)
i=0
The operator has the three additional clauses written around it.
Below are the bound variable, i, and the starting value for the range, written as i = 0.
Above is the ending value for the range, usually something like n.
To the right is some function to evaluate for each value of the bound variable. In this case, a generic
function, f (i).
This is read as sum f ( i ) for i in the range 0 to n.
One common denition of uses a closed range; one that includes the end values of 0 and n. This is not a
helpful denition for software; therefore, we will use a half-open interval. It has exactly n items, including
0 and n-1; mathematically, 0 i < n.
Consequently, we prefer the following notation. It has the bound variable and the range of values written
below. It has the function were evaluating written to the right.
f (i)
0i<n
Since statistical and mathematical texts often used 1-based indexing, some care is required when translating
formulae from textbooks to programming languages that use 0-based indexing.
232
Statistical Algorithms. Our statistical algorithms will be looking at data in lists (or tuples). In this case,
the variable x is a sequence of some kind, and the index (i) is an index to select individual values from the
sequence.
xi
0i<n
f (xi )
0i<n
Translating to Python. We can transform this denition directly into a for loop that sets the bound
variable to all of the values in the range, and does some processing on each value of a sequence of integers.
This is the Python implementation of . This computes two values, the sum, sum and the number of items,
n.
Sigma Using a Numeric Index
sum= 0
for i in range(len(aTuple)):
x_i= aTuple[i]
# fxi = some function of x_i
sum += x_i
n= len(aTuple)
1. Get the length of aTuple. Execute the body of the loop for all values of i in the range 0 to the number
of items-1.
2. Fetch item i from aTuple and assign it to x_i.
3. For simple mean calculation, the fxi statement does nothing.
For a standard deviation calculation, wed add a statement a fxi to compute the measure of deviation
from the average.
4. Sum the x_i (or fxi) values.
Simplication. In the usual mathematical notation, an integer index, i is used. In Python it isnt necessary
to use the formal integer index. Instead, an iterator can be used to visit each item of the list, without actually
using an explicit numeric counter. The processing simplies to the following.
Sigma Using an Iterator
for x_i in aTuple:
# fxi = some function of x_i
sum += x_i
n= len(aTuple)
233
Example. Heres an example of computing a sum. Were using the 6:00 AM temperatures this week as our
sample data. We created a tuple with the unimaginative name of data for holding this tuple of temperatures.
>>>
>>>
>>>
...
...
>>>
44
>>>
8
Our for statement iterated through the data. The suite within the for statement added the data values into
our accumulator, sum. The sum divided by the count is the mean.
To get precise results, be sure to use from __future__ import division.
Purchase Price
43.50
42.80
42.10
37.58
Shares
25
50
75
100
Symbol
CAT
DD
EK
GM
Current Price
92.45
51.19
34.87
37.58
We can represent each block of stock as a 5-tuple with purchase date, purchase price, shares, ticker
symbol and current price. We can create a list of those tuples, as follows.
portfolio= [ ( "25-Jan-2001", 43.50, 25, 'CAT', 92.45 ),
( "25-Jan-2001", 42.80, 50, 'DD', 51.19 ),
( "25-Jan-2001", 42.10, 75, 'EK', 34.87 ),
( "25-Jan-2001", 37.58, 100, 'GM', 37.58 )
]
Develop a function that examines a tuple which represents a block of stock, multiplies shares by
purchase price and returns the value of that block. The sum of these values is the total purchase price
of the portfolio.
This function would have the following denition:
def cost( aBlock ):
compute price times shares
return cost
Develop a second function that examines a tuple which represents a block of stock, multiplies shares
by purchase price and shares by a current price to determine the total amount gained or lost by this
block.
This function would have the following denition:
234
xi
x =
0i<n
235
236
Lets look at a list of Roulette wheel spins. Heres a depiction of a list of four items, the Python value is
["red", "red", "black", "red"]. Each item has a position that identies where it is in the list.
position
item
0
red
1
red
2
black
3
red
Because a list is mutable, new items can be added to the list. These new items can be inserted in any
position. We can append to the end of the list. We can put elements into the list by inserting before any of
existing positions. If we insert before position zero, we will extend the list at the beginning. In addition to
extending the list, we can replace any of the items in the list.
This statement creates a list using a list comprehension. A comprehension starts with a candidate list
(range(6), in this example) and derives the list values from the candidate using an expression (2*i+1 in
this example). A great deal of power is available in comprehensions.
9.4. Flexible Sequences : The list
237
This is a kind of literal list of valiues, using the [] syntax; it can be used anywhere a literal list is appropriate.
In the second example, a two-element tuple ("black","red" )) a kind of sequence is transformed into
a list of individual elements.
** The range() function is used heavily, primarily to control the for statement. Technically, it generates a
list, so we include it here, after we introduced it briey in The for Statement.
range([start ], stop[, step ]) list
The arguments must be plain integers. If the step argument is omitted, it defaults to 1. If the start
argument is omitted, it defaults to 0. The full form returns a list of plain integers [ start , start +
step , start + 2 * step , ... ]. If step is positive, the last element is the largest start + i * step less
than stop. If step is negative, the last element is the largest start + i * step greater than stop . step
must not be zero (or else ValueError is raised).
The * operator. The * operator between lists and numbers (number * list or list * number) creates a new
list that is a number of repetitions of the input list.
>>> 2*["pass","don't","pass"]
['pass', "don't", 'pass', 'pass', "don't", 'pass']
The [] operator. The [] operator selects an item or a slice from the list. There are two forms for picking
items or slices from a list.
This form extracts a single item.
list[index]
Items are numbered from 0 to len(list)-1. Items are also numbered in reverse from -len(list) to -1.
This extracts a slice, creating a new sequence from a sequence.
list[start:end]
238
Items from start to end-1 are chosen to create a new list as a slice of the original list; there will be end - start
items in the resulting list. If start is omitted it is the beginning of the list (position 0), if end is omitted it
is the end of the list (position -1).
For more information on how the numbering works for the [] operator, see Numbering from Zero.
In the following example, weve constructed a list, rolls where each of the six items in the list is a tuple
object. Each of these tuple objects is a pair of dice. When we say rolls[2], were extracting the item at
position 2, which is the third item from the list. In this example, its a hard 4, a pair of 2s.
>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> rolls[2]
(2, 2)
>>> print(rolls[:3], 'split', rolls[3:])
[(6, 2), (5, 4), (2, 2)] split [(1, 3), (6, 5), (1, 4)]
>>> rolls[-1]
(1, 4)
>>> rolls[-3:]
[(1, 3), (6, 5), (1, 4)]
The % Operator. The string format operator works between string and list. We prefer to use str.format(),
however.
max(sequence) value
Returns the largest value in the iterable (sequence, set or mapping).
>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> max(rolls)
(6, 5)
Recall that tuples are compared element-by-element. The tuple (6, 5) has a rst element that is
greater than all but one other tuple, (6, 2). If the rst elements are the same, then the second
element is compared.
min(sequence) value
Returns the smallest value in the iterable (sequence, set or mapping).
>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> min(rolls)
(1, 3)
Recall that tuples are compared element-by-element. The tuple (1, 3) has a rst element that is less
than all but one other tuple, (1, 4). If the rst elements are the same, then the second element is
compared.
Iteration Functions. These functions are most commonly used with a for statement to process list items.
239
enumerate(iterable) iterator
Enumerate the elements of a set, sequence or mapping. This yields a sequence of tuples based on the
original list. Each of the tuples has two elements: a sequence number and the item from the original
list.
This is generally used with a for statement. Heres an example:
>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> for position, roll in enumerate( rolls ):
...
print(position, sum(roll))
...
0 8
1 9
2 4
3 4
4 11
5 5
Now there are two copies of the original list: rolls is in the original order; descending is in descending
order.
reversed(sequence) iterator
This iterates through a sequence in reverse order.
This is generally used with a for statement. Heres an example:
>>> stats = [ (43,'red'), (52, 'black'), (5,'zero') ]
>>> for count, color in reversed( stats ):
...
print(count, color)
...
5 zero
52 black
43 red
240
sum(iterable) number
Sum the values in the iterable (set, sequence, mapping). All of the values must be numeric.
>>> range(1,8*2,2)
[1, 3, 5, 7, 9, 11, 13, 15]
>>> sum(_)
64
all(iterable) boolean
Return True if all values in the iterable (set, sequence, mapping) are equivalent to True.
The all() function is often used with List Comprehension, which well look at in List Construction
Shortcuts.
>>> compare_1 = [ 2<=3, 5<7, 22%2 == 0 ]
>>> all( compare_1 )
True
>>> compare_2 = [ 2 > 3, 5<7, 22%2 == 0 ]
>>> all( compare_2 )
False
>>> compare_2
(False, True, True)
any(iterable) boolean
Return True if any value in the iterable (set, sequence, mapping) is equivalent to True.
The any() function is often used with List Comprehension, which well look at in List Construction
Shortcuts.
>>> roll = 7
>>> test = [ roll == 2, roll == 3, roll == 12 ]
>>> any( test )
False
>>> test.append( roll == 7 )
>>> test.append( roll == 11 )
>>> any( test )
True
>>> test
[False, False, False, True, False]
This will create two random numbers, simulating a roll of dice. If the number is in the list of eld bets,
this is printed. Note that we assemble the nal list of eld bets from two other lists. In a larger application
program, we might distinguish between the eld bets based on dierent payout odds.
241
We have to note that comparing two lists which have very dierent contents may not be sensible. When we
compare two strings, we can use this to put them into alphabetic order. In the case of comparing tuples, we
generally compare tuples of the same length. For example, we might compare some three-tuples that encode
red-green-blue colors. This is consistent with the ways we use tuples to represent a piece of data that has a
xed number of individual items.
In the case of lists, however, we have to be sure that we have an obvious meaning for the comparison. Python
will allow us to compare any two list objects. As designers of programs, we have to be sure we are making
a sensible comparison between objects that should be compared in the rst place. We dont want to have
programs that do senseless things like compare a list of the 46 highest peaks in New York with the list of
ingredients in Fettucini Alfredo.
The list.append() method function does not return a new value. It modies the object. The return value
happens to be None.
class list
list.append(object)
Update list l by appending object to end of the list.
>>> a=["red","orange","yellow"]
>>> a.append("green")
>>> a
['red', 'orange', 'yellow', 'green']
list.extend(sequence)
Extend the list by appending sequence elements. Note the dierence from append(object), which
treats the argument as a single list object.
>>> a=["red","orange","yellow"]
>>> a.extend(["green","blue"])
>>> a
['red', 'orange', 'yellow', 'green', 'blue']
list.insert(index, object)
Update list l by inserting sequenceobject before position index. If index is greater than len(list),
the object is simply appended. If index is less than zero, the object is prepended.
>>> a=["red","yellow","green"]
>>> a.insert(1,"orange")
>>> a
['red', 'orange', 'yellow', 'green']
list.pop(index) item
Remove and return item at index (default is the last element, with an index of -1). An exception is raised
242
if the list is already empty. This is the opposite of append(). Further, this is both a transformation
of the list as well as an accessor that returns an item from the list.
>>> a=["red","yellow","green","blue"]
>>> a.pop()
'blue'
>>> a
['red', 'yellow', 'green']
list.remove(value)
Remove rst occurrence of value from list l. An exception is raised if the value is not in the list.
This example has a list of four initial values, a string, a number, the result of an expression (which will
be a number), and a tuple. Well remove the tuple (4,3,"craps") from the list.
>>> a=["red",21,6*6,(4,3,"craps")]
>>> a.remove( (4,3,"craps") )
>>> a
['red', 21, 36]
list.reverse()
Reverse the items of the list l. This is done in place, it does not create a new list.
>>> a=["red","yellow","green","blue"]
>>> a.reverse()
>>> a
['blue', 'green', 'yellow', 'red']
The list sort() transformation is very powerful. Well look at more sophisticated sorting options in Sorting
a List: Expanding on the Rules. For now, lets just look at the following simple examples. Well sort simple
lists of numbers and strings just to show you how this works.
>>>
>>>
>>>
[1,
>>>
>>>
a= [ 10, 1, 3, 9, 4 ]
a.sort()
a
3, 4, 9, 10]
b= [ "word", "topic", "subject", "part", "section", "chapter" ]
b.sort()
243
>>> b
['chapter', 'part', 'section', 'subject', 'topic', 'word']
Accessors. The following method functions determine a fact about a list and return that as a value.
class list
list.count(value) integer
Return number of occurrences of value in list l.
>>>
>>>
3
>>>
0
>>>
1
a=["red","red","black","red"]
a.count("red")
a.count("green")
a.count("black")
list.index(value) integer
Return index of rst occurrence of value in the list. If the item is not found, this will raise a ValueError.
If the given value is in the list, then list[ list.index(value) ] is value.
>>> a=["red","yellow","green","blue"]
>>> a.sort()
>>> a.index('red')
2
>>> a[2]
'red'
>>> a
['blue', 'green', 'red', 'yellow']
list.pop(index) item
Remove and return item at index (default is the last element, with an index of -1). An exception is raised
if the list is already empty. This is the opposite of append(). Further, this is both a transformation
of the list as well as an accessor that returns an item from the list.
>>> a=["red","yellow","green","blue"]
>>> a.pop()
'blue'
>>> a
['red', 'yellow', 'green']
244
This will only work of the list has a xed and known number of elements. This kind of multiple assignment
makes more sense when working with tuples, which are immutable, rather than lists, which can vary in
length.
The for Statement. The for statement works directly with sequences. When we rst looked at for
statements, we used the range() function to create a list for us. We can also create lists other ways. Well
see still more list construction techniques in the next chapter.
Heres the basic syntax for providing a literal sequence of values. We provide the list object that we want
the for statement to use as the sequence of values. In this example, the variable i will be set to each value
in the list, the prime numbers between 2 and 19.
s= 0
for i in [2,3,5,7,11,13,17,19]:
s += i
print("total", s)
The del Statement. The del statement removes items from a list. For example
>>>
>>>
>>>
[1,
i = range(10)
del i[0], i[2], i[4], i[6]
i
2, 4, 5, 7, 8]
245
Queues are often used as buers to match processing speeds between fast and slow operations. For example,
it takes less than a minute for my computer to generate a document with 402 pages, but my printer will
take almost an hour to print the document. To balance this speed dierence, the operating system creates
a queue of print jobs.
Using Lists. Both the stack and queue are essentially a list. In the case of a stack, it is a list that has
items added and removed at the last position only. A queue, on the other hand, has items appended at the
end, but removed from the front of the list.
The append() and pop() method functions can be used to create a standard stack. The append() function
places an item at the end of the list (or top of the stack), where the pop() function can remove it and return
it.
>>> stack= []
>>> stack.append("part I")
>>> stack.append("chapter 1")
>>> stack.append("intro section" )
>>> stack.pop()
'intro section'
>>> stack.append("another section" )
>>> stack.pop()
'another section'
>>> stack.pop()
'chapter 1'
>>> stack.pop()
'part I'
>>> stack
[]
The append() and pop(0)() functions can be used to create a standard queue, or rst-in-rst-out (FIFO)
list. The append() function places an item at the end of the queue. Evaluating pop(0)() removes the rst
item from the queue it and returns it.
>>> queue=[]
>>> queue.append("part
>>> queue.append("part
>>> queue.pop(0)
'part I'
>>> queue.append("part
>>> queue.pop(0)
'part II'
>>> queue.append("part
>>> queue.pop(0)
'part III'
>>> queue.pop(0)
'part IV'
>>> queue
[]
I")
II")
III")
IV")
246
Roulette, there are a large number of outcomes, but we can focus on just betting red and black. This
makes the game almost a coin toss. There are a total of 38 outcomes on an American table, composed
of 18 red, 18 black and 2 green. If we play a number of rounds well win some and lose some. If we
stay at the table for 200 spins, our results could vary from the really unlikely 200 wins to the equally
unlikely 200 losses. In the middle are mixes of wins and losses. For a truly fair coin toss, this range of
values is called the Gaussian or normal distribution. The question we need to have answered, is what
is the average result of sessions that last for 200 spins of the wheel?
We can create a sequence of values that represents the wheel by assembling a list that has 18 copies
of 'red', 18 copies of 'black' and two copies of 'green'. We can then use the random.choice()
function to pick one of these values as the result of the spin.
If the chosen result is 'black', weve won, and our stake increases by one bet. Otherwise, weve lost
and our stake decreases by one bet.
To simulate a session of betting, we initialize our table stakes to 100 betting units. This means we go
to a $5 Roulette table with $500 in our pocket. We can create a loop which does the following 200
times:
(a) Use the random.choice() function to pick one of 38 values as the result of the spin.
(b) Increase or decrease the stake depending on the color chosen.
Each session, therefore, will have a result that is a single number, the nal amount we left the table
with. You can check your result by simulating a few thousand sessions and accumulating a sequence
of nal amounts.
Compute the average of your sequence of nal amounts. You should have an average result of about
89. The standard deviation should be around 14. What does this mean? We can expect to lose 11
betting units over 200 spins of the wheel.
2. Creating a Dierent Sequence of Outcomes.
In the previous exercise, we created a random sequence of outcomes for Roulette using a simple always
bet on black betting strategy. What if we want to use a bet with a dierent payout? For example,
the three column bets pay 2:1 when they win. How does this change our results?
In Roulette there are 12 column 1, 12 column 2, 12 column 3, and 2 zero results on an American
table. As with the previous exercise (Creating a Sequence of Outcomes), we can construct a sequence
that represents the wheel by assembling a list of 38 elements that have the proper number of "col1",
"col2", "col3" and "zero" values. We can then use the random.choice() function to pick one of
these values as the result of the spin.
Well assume a consistent bet on "col3". Well choose a random result from the wheel sequence; if
this result is "col3", weve won, and our stake increases by two bets. Otherwise, weve lost and our
stake decreases by one bet.
We can revise our previous example to use this wheel, bet and result.
Compute the average of your sequence of nal amounts. You should have an average result of about
89. The standard deviation should be around 19. What does this mean? We can expect to lose 11
betting units over 200 spins of the wheel.
3. Creating a Sequence of Really Bad Outcomes.
In the previous exercises, we created a random sequence of outcomes for Roulette using some simple
always bet on black or always bet on column three betting strategy. What if we want to use a
bet with a really bad payout? For example, there is a bet that covers zero, double zero, one, two and
three. This bet will win 5/38th of the time, but pays as if it won 6.33/38 of the time. How does this
change our results?
247
In Roulette there are 5 '5bet' results and 33 'other' results on an American table. As with the
previous exercises (Creating a Sequence of Outcomes), we can construct a sequence that represents the
wheel by assembling a list of 38 elements that have the proper number of "5bet", "other" values. We
can then use the random.choice() function to pick one of these values as the result of the spin.
Well assume a consistent bet on "5bet". Well choose a random result from the wheel sequence; if
this result is :"5bet", weve won, and our stake increases by six bets. Otherwise, weve lost and our
stake decreases by one bet.
We can revise our previous example to use this wheel, bet and result.
Compute the average of your sequence of nal amounts. You should have an average result of about
83. The standard deviation should be around 34. What does this mean? We can expect to lose 17
betting units over 200 spins of the wheel.
4. Random Number Evaluation.
Before using a new random number generator, it is wise to evaluate the degree of randomness in the
numbers produced. A variety of clever algorithms look for certain types of expected distributions of
numbers, pairs, triples, etc. This is one of many random number tests.
If we generate thousands of random numbers between 0 and 9, we expect that well have the name
number of 0s as 9s. Specically, we expect that 1/10th of our numbers are 0s, 1/10th are 1s, etc.
Actually random numbers are well random, so they will deviate slightly from this perfection.
This dierence between actual and expected can be used for a more sophisticated statistical test called
a Chi-Squared test. The formula is pretty simple, but the statistics beyond this book. The idea is,
however, that the Chi-Squared test can help us tell whether our data is too well organized, meets our
expectation for randomness, or is too disorganized.
What well do is generate random numbers, and assign them to one of ten dierent bins. When
weve done this for a few thousand samples, well compare the count of numbers in each bin with our
expectation to see if weve got a respectable level of randomness.
Use random.random() to generate an array of 1000 random samples; assign this to the variable u.
These numbers will be uniformly distributed between 0 and 1.
Distribution test of a sequence of random samples, U
248
import random
[ random.randrange(0,37) for i in range(1000) ]
[You may have already noticed the error in the above statement.]
We can use the following procedure to do a complete evaluation.
Unique Values of a Sequence, seq
249
2. Quicksort.
The super-fast sort algorithm.
As a series of loops it is rather complex. As a recursion it is quite short. This is the same basic
algorithm in the C libraries.
Quicksort proceeds by partitioning the list into two regions: one has all of the high values, the other
has all the low values. Each of these regions is then individually sorted into order using the quicksort
algorithm. This means the each region will be subdivided and sorted.
For now, well sort an array of simple numbers. Later, we can generalize this to sort generic objects.
Quicksort a List, a between elements lo and hi
(a) Partition
i. Initialize. ls, hs lo, hi. Setup for partitioning between ls and hs.
middle (ls + hs) 2.
ii. Swap To Partition. while ls < hs:
If a[ls].key a[middle].key: increment ls by 1. Move the low boundary of the
partitioning.
If a[ls].key > a[middle].key: swap the values a[ls] a[middle].
If a[hs].key a[middle].key: decrement hs by 1. Move the high boundary of the
partitioning.
If a[hs].key < a[middle].key:, swap the values a[hs] a[middle].
(b) Quicksort Each Partition.
QuickSort( a , lo, middle )
QuickSort( a , middle+1, hi )
3. Recursive Search.
This is also a binary search: it works using a design called divide and conquer. Rather than search
the whole list, we divide it in half and search just half the list. This version, however is dened with a
recursive function instead of a loop. This can often be faster than the looping version shown above.
Recursive Search a List, seq for a target, tgt, in the region between elements lo and hi.
250
Sieve of Eratosthenes
(a) Initialize. Create a list, prime of 5000 booleans, all True, initially.
p 2.
(b) Iterate. While 2 p < 5000.
i. Find Next Prime. While not prime[p] and 2 p < 5000:
Increment p by 1.
ii. Remove Multiples. At this point, p is prime.
Set k p + p.
while k < 5000.
prime[k] F alse.
Increment k by p.
iii. Next p. Increment p by 1.
(c) Report. At this point, for all p if prime [ p ] is true, p is prime.
while 2 p < 5000:
if prime[p]: print p
The reporting step is a lter operation. Were creating a list from a source range and a lter rule.
This is ideal for a list comprehension. Well look at these in List Construction Shortcuts.
Formally, we can say that the primes are the set of values dened by primes = {p|0p<5000 if primep }.
This formalism looks a little bit like a list comprehension.
5. Polynomial Arithmetic.
We can represent numbers as polynomials. We can represent polynomials as arrays of their coecients.
This is covered in detail in [Knuth73], section 2.2.4 algorithms A and M.
Example: 4x3 + 3x + 1 has the following coecients: ( 4, 0, 3, 1 ).
The polynomial 2x2 3x 4 is represented as ( 2, -3, -4 ).
The sum of these is 4x3 + 2x2 3; ( 4, 2, 0, -3 ).
The product these is 8x5 12x4 10x3 7x2 15x 4; ( 8, -12, -10, -7, -15, -4 ).
You can apply this to large decimal numbers. In this case, x is 10, and the coecients must all be
between 0 and x-1. For example, 1987 = 1x3 + 9x2 + 8x + 7, when x = 10.
Add Polynomials, p, q
251
252
As an example of using red, green, blue tuples, we may have a list of individual colors that looks like the
following. Here, weve dened three colors black, a dark grey, a purple and assigned this list of colors to
the variable colorScheme.
colorScheme = [ (0,0,0), (0x20,0x30,0x20), (0x80,0x40,0x80) ]
A interesting form of the for statement uses multiple assignment to work with a list of tuples. Consider the
following example which assigns r, g and b from each element of the 3-tuple in the list. We can then do
calculations on the three values independently.
colorScheme = [ (0,0,0), (0x20,0x30,0x20), (0x80,0x40,0x80) ]
for r,g,b in colorScheme:
print("color ({0:d},{0:d},{0:d})".format( r, g, b ))
print("opposite ({0:d},{0:d},{0:d})".format( 255-r, 255-g, 255-b ))
This is equivalent to the following. In this example, we have the for statement assign each item in the list
to the variable color, and then we use a separate multiple assignment to decompose the for tuple in r, g and
b.
colorScheme = [ (0,0,0), (0x20,0x30,0x20), (0x80,0x40,0x80) ]
for color in colorScheme:
r, g, b = color
print("color ({0:d},{0:d},{0:d})".format( r, g, b ))
print("opposite ({0:d},{0:d},{0:d})".format( 255-r, 255-g, 255-b ))
The items() function of a dictionary transforms a dictionary to a sequence of tuples. Well cover dictionaries
in Mappings : The dict. This is a teaser for some of what well see there.
from __future__ import print_function, division.
d = { 'red':18, 'black':18, 'green':2 }
for c,f in d.items():
print("{0} occurs {1:f}".format(c, f/38))
The zip() built-in function interleaves two or more lists to create a list of tuples from the two lists. This is
not terribly useful, but well use it to build dictionaries.
253
The overall generator expression executes the for loop; for each iteration, it evaluates the expression and
yields value. The list comprehension uses that sequence of values to create the resulting list.
Here are some examples.
>>> import random
>>> [ 3*x+2 for x in range(12) ]
[2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35]
>>> [ (x,x) for x in (2,3,4,5) ]
[(2, 2), (3, 3), (4, 4), (5, 5)]
>>> [ random.random() for x in range(5) ]
[0.4527184178006578, 0.84888059794845783, 0.21016399448987311, 0.80816095098407259, 0.87693626640363287]
The basic process, then, is to iterate through the sequence in the for-clause, evaluating the expression,
expression. The values that result are assembled into the list.
If the expression depends on the for-clause target variable, the expression is a map from the for-clause
variable to the resulting list. If the expression doesnt depend on the for-clause target value, each time we
evaluate the expression well get the same value.
Heres an example where the expression depends on the for-clause. This is a mappings from the range(10)
to the nal list.
>>> a= [ v*2+1 for v in range(10) ]
>>> a
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
This creates the rst 10 odd numbers. It starts with the sequence created by range(10). The for-clause
assigns each value in this sequence to the target variable, v. The expression, v*2+1, is evaluated for each
distinct value of v. The expression values are assembled into the resulting list.
Typically, the expression depends on the variable set in the for-clause. Heres an example, however, where
the expression doesnt depend on the for-clause.
b= [ 0 for i in range(10) ]
This creates a list of 10 zeros. Because the expression doesnt depend on the for-clause, this could also be
done as
b= 10*[0]
Filter Processing. A comprehension can also have an if-clause. This acts as a lter to determine which
elements belong to the list and which elements do not belong.
The more complete syntax for a list comprehension is as follows:
[ expr for-clause [ for-clause | if-clause ] ... ]
The expr is any expression. The for-clause mirrors the for statement:
for variable in sequence
254
if filter
This syntax summary shows that the rst for-clause is required. This can be followed by either for-clauses
or if-clauses. The | means that we can use either a for-clause or an if-clause .
This syntax summary shows a ... which means that you can repeat as many for-clauses and if-clauses as
you need. Well stick to the most common form, which is a single if-clause to create a lter.
Note that theres no , or other punctuation; the various for-clauses and if-clauses are simply separated by
spaces.
Here is an example that creates the list of hardways rolls, which excludes two 2s and two 12s. The for loop
creates a sequence of six numbers (from 1 to 6), assigning each value to x . The if lter only keeps values
where x+x is not 2 or 12. All other values are used to create a tuple of (x,x).
hardways = [ (x,x) for x in range(1,7) if x+x not in (2, 12) ]
These more complex list comprehensions behave like the following loop:
r= []
for target in sequence :
if filter :
r.append( expr )
The basic process, then, is to iterate through the sequence in the for-clause, evaluating the if-clause. When
the if-clause lter is True, evaluate the expression, expr. The values that result are assembled into the list.
>>> v = [ (x,2*x+1) for x in range(10) if x%3==0 ]
>>> v
[(0, 1), (3, 7), (6, 13), (9, 19)]
255
County
Albany
Allegany
State
NY
NY
Jobs
162692
11986
Wyoming
Yates
NY
NY
8722
5094
We can easily transform this raw data into a sequence of tuples that look like the following.
jobData= [
('001','Albany','NY',162692),
('003','Allegany','NY',11986),
...
('121','Wyoming','NY',8722),
('123','Yates','NY',5094),
]
Simple Sorting. Sorting this list can be done trivially with the list sort() method.
jobData.sort()
Note that this updates the jobData list in place. The sort() method specically does not return a result.
A common mistake is to say something like: a= b.sort(). This always sets the variable a to None.
This kind of sort will simply compare each tuple with each other tuple. This makes it very easy to use, if
your tuples elements are in the right order. If you want to compare the elements of your tuple in a dierent
order, however, youll need to do something extra.
Sorting By Another Column. Lets say we wanted to sort by state name, the third element in the tuple.
We want dont want the naive comparison among tuples. We want a smarter comparison that looks at the
elements we choose, not the rst element in the tuple. We do this by giving a key function to the sort()
method.
The key function returns an object or a simple sequence of the key values selected from each element to be
sorted. In this case, we want the key function to return the third elements of our county jobs tuples.
def by_state( row ):
return row[2]
jobData.sort( key=by_state )
Note that we pass the function object to the sort() method. A common mistake is to say jobData.sort(
by_state() ). If we include the ()s, we evaluate the function by_state() once, which is a mistake.
We dont want to evaluate the function; we want to provide the function to sort(), so that sort() can
evaluate the function as many times as needed to sort the list.
256
Note that if we say by_state(), we evaluate sort3() without any argument values, which is also a type
error. If we say by_state naming the function instead of evaluating it then sort() will properly call
the function with the expected single argument.
Sorting By Multiple Fields. Another common process is to sort information by several key elds.
Continuing this example, lets sort the list by state name and then number of jobs. This is sometimes called
a multiple-key sort. We want our data in order by state. Within each state, we want to use the number of
jobs to sort the data.
We do this by creating a tuple of the elds we want to use for sorting.
def by_state_jobs( row ):
return ( a[2], a[3] )
jobData.sort( key=by_state_jobs )
The sort() method must compare elements of the sequence against each other. If the sort() method is
given a key function, this function is called to create the sort comparison key for each element.
In our case, weve provided a function (by_state_jobs()) that extracts a tuple as the key. The tuple
contains the state and the number of jobs from each row.
Tip: Debugging List Sorting
There are three kinds of problems that can prevent a customized sort operation from working correctly.
Our key function doesnt have the right form. It must be a function that extracts the key from an item
of the sequence being sorted.
def key( item ):
return something based on the item
The data in your list isnt regular enough to be sorted. For example, if we have dates that are
represented as strings like '1/10/56', '11/19/85', '3/8/87', these strings are irregular and wont
sort very nicely. As humans, we know that they should be sorted into year-month-date order, but the
strings that Python sees begin with '1/', '11' and '3/', with an alphabetic order that may not be
what you expected.
To get this data into a usable form, we have to normalize it. Normalizing is a computer science term
for getting data into a regular, consistent, usable form. In our example of sorting dates, well need
to use the time or datetime modules to parse these strings into proper Python objects that can be
compared.
Ascending vs. Descending. The default sort is ascending order. We can sort into descending order by
adding the reverse keyword parameter to the sort.
jobData.sort( key=by_state_jobs, reverse=True )
By default, reverse is False, giving us ascending order. When we set it to true, the list is sorted in reverse
order; that is, descending.
The Lambda Shorthand. In reading other programs, you may see something like the following:
jobData.sort( key=lambda row: row[2] )
This lambda is a small, anonymous function denition. These are used sometimes because it saves having
to create a function which is only used once in a single sort() operation. Mentally, you can rewrite this to
the following:
257
1
2
3
4
5
6
2
3
4
5
6
7
8
3
4
5
6
7
8
9
4
5
6
7
8
9
10
5
6
7
8
9
10
11
6
7
8
9
10
11
12
In Python, a multi-dimensional table can be done as a sequence of sequences. This table is a sequence of
rows.
Each individual row, in turn is a sequence of individual cells. This allows us to use mathematical-like
notation. Where the mathematician might say Ai,j , in Python we say A[i][j]. We want the row i from
table A, and column j from that row.
Building a Table. We can build a table using a nested list comprehension. The following example creates
a table as a sequence of sequences and then lls in each cell of the table.
from __future__ import print_function
table= [ [ 0 for i in range(6) ] for j in range(6) ]
print(table)
for d1 in range(6):
for d2 in range(6):
table[d1][d2]= d1+d2+2
print(table)
1. Use a list comprehension to create a six by six table of zeros. Actually, the table is six rows. Each row
has six columns.
The comprehension can be read from inner to outer, like an ordinary expression. The inner list, [ 0
for i in range(6) ], creates a simple list of six zeros. The outer list, [ [...] for j in range(6)
] creates six copies of these inner lists.
2. Print the grid of zeroes.
3. Fill this list of lists with each possible combination of two dice. This is not the most ecient way to
do this, but we want to illustrate several techniques with a simple example. Well look at each half in
detail.
258
4. Iterate over all combinations of two dice, lling in each cell of the table. This is done as two nested
loops, one loop for each of the two dice. The outer for loop enumerates all values of one die, d1. The
inner for loop enumerates all values of a second die, d2.
Updating each cell involves selecting the row with table[d1]; this is a list of 6 values. The specic
cell in this list is selected by ...[d2]. We set this cell to the number rolled on the dice, d1+d2+2. This
program produced the following output.
[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]
[[2, 3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8], [4, 5, 6, 7, 8, 9],
[5, 6, 7, 8, 9, 10], [6, 7, 8, 9, 10, 11], [7, 8, 9, 10, 11, 12]]
Better-Looking Output. The printed list of lists is a little hard to read. The following loop would display
the table in a more readable form.
>>>
...
...
[2,
[3,
[4,
[5,
[6,
[7,
4,
5,
6,
7,
8,
9,
5, 6, 7]
6, 7, 8]
7, 8, 9]
8, 9, 10]
9, 10, 11]
10, 11, 12]
As an exercise, well leave it to the reader to add some features to this to print column and row headings
along with the contents. As a hint, the "{0:2d}".format(value) string operation might be useful to get
xed-size numeric conversions.
Summarizing A Table. Lets summarize this two-dimensional table into a frequency table. The values of
two dice range from 2 to 12. If we use a list with 13 elements, these elements will be identied with indexes
from 0 to 12, allowing us to accumulate counts in this list.
fq= 13*[0]
print(fq)
for row in table:
for c in row:
fq[c] += 1
print(fq[2:])
Using Indexes. There is an alternative to this approach. Rather than strip out each row sequence, we
could use explicit indexes and look up each individual value with an integer index into the sequence.
259
for i in range(6):
for j in range(6):
c= table[i][j]
fq[ c ] += 1
The outer loop sets the variable i to the values from 0 to 5. The inner loop sets the variable i to the values
from 0 to 5.
We use the index value of i to select a row from the table, and the index value of i to select a column from
that row. This is the value, c. We then accumulate the frequency occurrences in the frequency table, fq.
The rst version has the advantage of directly manipulating the Python objects, it is somewhat simpler.
The second version, however, is more like common mathematical notation, and more like other programming
languages. It is more complex because of a level of indirection. Instead of manipulating the Python sequence,
we access the objects indirectly via their index in a sequence.
Matrix Addition. We use this latter technique for managing the mathematically dened matrix operations.
Matrix operations are done more clearly with this style of explicit index operations. Well show matrix
addition as an example, here, and leave matrix multiplication as an exercise in a later section.
m1 = [ [1, 2, 3, 0], [4, 5, 6, 0], [7, 8, 9, 0] ]
m2 = [ [2, 4, 6, 0], [1, 3, 5, 0], [0, -1, -2, 0] ]
m3= [ 4*[0] for i in range(3) ]
for i in range(3):
for j in range(4):
m3[i][j]= m1[i][j]+m2[i][j]
In this example we created two input matrices, m1 and m2, each three by four. We initialized a third
matrix, m3, to three rows of four zeros, using a comprehension. Then we iterated through all rows (using
the i variable), and all columns (using the j variable) and computed the sum of m1 and m2.
260
Given an input sequence, seq, we can easily sort this sequence. This will put all equal-valued elements
together. The comparison for unique values is now done between adjacent values, instead of a lookup
in the resulting sequence.
Unique Values of a Sequence, seq.
1. Matrix Formatting.
Given a 6 6 matrix of dice rolls, produce a nicely formatted result. Each cell should be printed with
a format like "| {0:2s}" so that vertical lines separate the columns. Each row should end with an
'|'. The top and bottom should have rows of "----"'s printed to make a complete table.
2. Three Dimensions.
If the rolls of two dice can be expressed in a two-dimensional table, then the rolls of three dice can be
expressed in a three-dimensional table. Develop a three dimensional table, 6 6 6, that has all 216
dierent rolls of three dice.
Write a loop that extracts the dierent values and summarizes them in a frequency table. The range
of values will be from 3 to 18.
261
a= "some"
b= a + " long"
b= b + " string"
None of the string objects ("some", " long" or " string") change. There are two new strings that
are built by this program: "some long" and "some long string". Neither of these change after they
are built as the program runs.
When the program ends, two strings ("some" and "some long string") are associated with variables
a and b. The remaining strings are quietly removed from memory, since they are no longer needed.
While the strings themselves are immutable, the values assigned to our variables reect our intent to
assemble a long string from smaller pieces.
Since lists do everything tuples do and are mutable, why bother with tuples? Immutable
tuples are more ecient than variable-length lists. There are fewer operations to support. Once the
tuple is created, it can only be examined. When it is no longer referenced, the normal Python garbage
collection will release the storage for the tuple.
Many applications rely on xed-length tuples. A program that works with coordinate geometry in
two dimensions may use two-tuples to represent ( x , y ) coordinate pairs. Another example might
be a program that works with colors as three-tuples, ( r , g , b ), of red, green and blue levels. A
variable-length list is not appropriate for these kinds of xed-length tuple.
Wouldnt it be more ecient to allow mutable strings? Variable length strings are most commonly
implemented by imposing an upper limit on a strings length. Having this upper limit is unappealing
because it leads to the possibility of a program having data larger than this upper limit. Indeed, this
buer overow problem is at the root of many security vulnerabilities.
This xed upper limit model is embodied in the C string libraries. Strings can vary in length, but
require the programmer set a xed upper bound on the length. This amount of storage is allocated,
and the string can vary up to that limit. While this provides excellent performance, it does impose
an arbitrary restriction. Some languages (Java for example) stop gracefully when the string limit is
exceeded, others (C for example) behave badly when strings exceed their declared length.
In eect, Python has strings of arbitrary size. Python does this by creating new strings instead
of attempting to modify existing strings. Python is freed from this security issues associated with
variable length strings and the resulting buer overow problem.
I noticed map, lter and reduce functions in the Python reference manual. Shouldnt we cover these?
These functions are actually rather dicult to describe in this context because they reect a view
of programming that is fundamentally dierent from the approach weve taken in this book. Were
covering programming from an imperative point of view. These three functions reect the functional
viewpoint. Both approaches are suitable for newbies. We had to pick one, and the coin toss came up
imperative.
In the long run, these functions arent that useful. Why? Because the List Comprehension (see List
Construction Shortcuts) does everything that the map() and filter() functions do, making them
unnecessary. The reduce design is often much more clearly expressed with an explicit for or while
statement than with the reduce() function.
262
CHAPTER
TEN
ADDITIONAL PROCESSING
CONTROL PATTERNS
Exceptions and Iterators
Exception processing is a way to alter the ordinary sequential execution of the statements in our program.
Additionally, an Exception is an object that is raised internally by Python when our program does something
illegal. We can make considerable use of exceptions and exception-handling statements to create event-driven
programs. Well cover this in The Unexpected : The try and except statements.
In Looping Back : Iterators, the for statement and Generators well look closely at some advanced procedural
processing. Well look at a Python object called an iterator and how we can create generator functions. These
will allow us to dene some more sophisticated processing; processing that will help us cope with the kinds
of les we often encounter in the real world.
263
We see this happen when we do something as simple as provide improper argument values to a function. This
includes dividing by zero, or using math.sqrt() on a negative number. Heres a common kind of exception:
well provide improper values to the int() function; it will raise an exception. Since the exception isnt
handled, our one-line program will stop.
>>> int('not a number')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'not a number'
Everything has a data and a processing side to it. Exceptions are no exception. [Well, I thought it was
funny.]
Processing. An exception is an event that changes the sequence of statement execution.
A raise statement interrupts the sequential processing of statements. This statement will also
create an Exception object.
Handlers can process the exception, and use the Exception object.
Data. An Exception object contains information about the exceptional situation. The data object is
created by a raise statement and used by handlers. An exception, at the minimum has a name, but it
can have a tuple of argument values, also.
The use of exceptions has a few important consequences.
1. The places in a program that raise exceptions may be hidden deep within a function or class. They
should be exposed by describing them in the docstring. A phrase like raises MySpecialException is
sucient to alert readers of where exceptions originate.
2. Parts of a program will have handlers to cope with the exceptions. These handlers should handle just
the meaningful exceptions. Some exceptions (like RuntimeError or MemoryError) generally cant be
handled within a program; when these exceptions are raised, the program is so badly broken that there
is no real recovery.
Good and Bad Uses. Exceptions can be overused. Because exceptions change the sequence of statements
that get executed, they can make a program murky and hard to follow.
Exceptions are best used to manage rare, atypical conditions. Exceptions should be considered as dierent
from expected or ordinary conditions that a program is desinged to handle.
Heres one example: accepting input from a person. Exception processing is not typically used to validate
the persons inputs. People make mistakes all the time trying to enter numbers or dates, and these kinds of
errors are not exceptional.
On the other hand, unexpected disconnections from network services are good candidates for exception
processing. These are rare and atypical. Exceptions are best used for handling problems with physical
resources like les and networks.
While exceptions are best applied to rare situations, there is an example in Python where an exception is
used for what appears to be a common situation. In the case of a for statement, there are times when the
loop is ended by a StopIteration exception. The StopIteration exception is not something that your
programs would ever deal with, so this use of exceptions is well exceptional.
Python has a large number of built-in exceptions, and you can create new exceptions. Generally, it is better
to create a new exception that precisely captures the situation rather than attempt to bend the meaning of
an existing exception.
try:
suite
except:
handler suite
Each suite is an indented block of statements. Any statement is allowed in the suite, including additional
try statements.
If any statement in the try suite raises an exception, each of the except clauses are examined for a clause
that matches the exception raised.
The normal course of events is that no statement in the try suite raises an exception. If there is no
exception, then the except clauses are silently ignored.
Each except suite is designed to handle specic exceptions. Additionally, a nal except suite, with no
specic exception, can be provided that is a catch-all. This nal non-specic except suite will be used if no
other except suite matched the exception.
The structure of the try and except statements follow this basic philosophy of exceptions.
1. Attempt the intended suite of statements, expecting them work.
2. In the unlikely event that an exception is raised in the try suite, nd an except clause to handle the
exceptional situation.
3. If no except clause matches the exception raised by the try suite, and there is a generic except clause,
execute that suite to handle the exceptional situation.
4. If there is no handler for the exception that was raised, the program stops with an error. This is what
would have happened if there had been no try statement in the rst place.
Working with IDLE. Heres an simplied example that will show the indentation that IDLE does for us
automatically. This try statement has multiple suites. IDLE will indent automatically after the try clause.
Well have to use the delete key to outdent one level to enter the except clause.
>>> try:
...
a = int("hi mom")
...
print(a)
... except Exception, e:
...
print("Error:", e)
...
a = 42
...
Error: invalid literal for int() with base 10: 'hi mom'
1. You start to enter the try statement. When you type the letter y, the color of try changes to orange,
as a hint that IDLE recognizes a Python statement. After the : (which is black), you hit enter at
the end of the rst line of the try statement.
2. IDLE indents for you. You type the two statements of the suite of statements. The assignment
statement (a=int("hi mom")) is going to fail when the whole statement is executed. When it raises
an exception, Python will start examining except clauses for a matching exception; the print(a)
statement will never get executed.
265
3. To outdent, you use the backspace (Macintosh users will use the delete key). Notice that when you
nish spelling except, that it changes color. Similarly, when you nish spelling Exception, it also
changes color. Since this statement ends with a :, IDLE will automatically indent for you, so you can
put in the exception-handling suite.
4. At the end of the suite, you dont have any more statements, so you hit enter on a blank line. The
try statement is complete, so IDLE executes the statement. The exception is raised, it matches the
rst exception clause, a message is printed, and then variable a is set.
5. This is the output. This is the text of the ValueError which was raised by the attempt to create an
integer value from the string "hi mom".
6. When we ask for the value of a, we see that it has the value assigned in the exception clause, 42.
Since the suite of statements in the try clause always raises an exception, this example is a little contrived.
Lets look at some more typical examples.
5
6
7
8
9
10
11
12
13
14
sum= 0
try:
for d in data:
sum += d
print(sum/len(data))
except ZeroDivisionError:
print("No values in data")
except TypeError:
print("Some value in data is not a number")
4. We set data to dene a set of data that well average. If we set data to an empty tuple, or a tuple with
non-numeric data, well can see dierent types of exceptions.
7. In the try suite, we attempt to compute the sum of the values in the tuple. For certain kinds of
inappropriate input, these statements will raise exceptions.
If data is (), an empty tuple, the try clause will attempt to divide by zero. This will raise an
exception.
If data has a non-numeric element, the try clause will attempt to do a numeric operation on a
string, and raise an exception.
11. We have an except clause to handle a ZeroDivisionError. If this exception is raised, it indicates
that we were given an empty tuple.
13. We also have an except clause to handle a ValueError. If this exception is raised, indicates that we
attempted to sum a value which was not a number.
You can run the above example three dierent ways and see the dierent kinds of exception handling. You
do this by moving the comment ( # ) to choose which value of data you want to use.
266
There are two common design patterns for exception handlers. The most common kind of exception handling
will clean up in the event of some failure; it might delete useless les, for example. A slightly less common
kind of exception will compute an alternate answer; it might return a complex number instead of a oatingpoint number, for example. These choices arent exclusive and some handlers will both delete resources and
compute an alternate answer.
This second syntax example shows us that the Exception object is a kind of container, and you can stu a
value into the Exception that provides additional information. Usually the value is a String that amplies
the exception by providing details about the exceptional condition.
raise
class ( value )
Built-in exceptions can be raised by giving the exception class name for the class.
raise ValueError("oh dear me")
This statement raises the built-in exception ValueError with an amplifying string of "oh dear me". The
amplifying string in this example, one might argue, is of no use to anybody. This is an important consideration in exception design. When using a built-in exception, be sure that the parameter string pinpoints the
error condition.
It is possible to dene a new class of exception objects. Well return to this as part of the object oriented
programming features of Python, in Dening New Objects. Heres the short version of how to create your
own unique exception class. In this example, weve created a new family of exceptions called MyError.
class MyError( Exception ): pass
This single line denes a subclass of Exception named MyError. You can then raise MyError in a raise
statement and check for MyError in except clauses.
Heres how you raise an exception of your own invention.
class MayNotBeNone( Exception ): pass
def someFunction( param ):
"""Does some processing or raises MayNotBeNone if param is None."""
if param is None:
raise MayNotBeNone( "{0!r} is invalid".format(param) )
267
# Some Processing
return "Some Result for {0!r}".format(param)
Here are two examples of using this function. The rst example provides a valid argument value, and no
exception is raised. The second example, however, provides an illegal input and an exception is raised by
the function.
>>> someFunction( complex(1,.5) )
'Some Result for (1+0.5j)'
>>> someFunction(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in someFunction
__main__.MayNotBeNone: None is invalid
Exceptions can be raised anywhere, including in an except clause of a try statement. Raising an exception
in an exception handler is a way to translate an exception from an internal Python exception to one our of
own exceptions.
class MyError( Exception ): pass
try:
attempt something risky
except FloatingPointError:
raise MyError("something risky failed")
This example does some initial processing of the exception in the function do_something() and then reraise the original exception again for processing by any enclosing try statements. This kind of two-step
processing is often done to do cleanup of the risky statement, and then re-raise the exception so that the
overall application to then log the error or stop running gracefully.
268
Forces and Alternatives. Using raw_input() means that we have a lot of programming to validate the
input, provide help and handle exceptions. Since this programming is almost always the same, we need to
package this as a function.
A single function to handle all kinds of input seems rather complex. Validating a time or date is dierent
from validating a yes/no answer. It will be easiest to have a family of functions: one for yes/no, another
for dates, another for le names.
A common way to provide help is to reserve an additional keyboard key for this. Apple keyboards have a
help key, but most other computers lack this. Consequently, machines that are designed to run Windows
traditionally use F1. However, when running in IDLE, F1 is captured by IDLE, not by our script. When
running from the command window, F1 is captured by the command window itself as part of command
history processing. Consequently, well use the ? key for help.
There is no standard way for a user to say they want to exit from a script. While a character sequence like
ctrl-Q is often used by GUI applications, this doesnt work very well for our scripts. IDLE captures this
key sequence before our script sees it as input. Consequently, well use end-of-le or a simple Q to signal
that we want to quit.
A Solution. Well dene a more focused function to get user input. This function will validate the input,
providing useful error messages. It will also provide help when the user asks, and raise an exception when
the user wants to quit.
Since we stole this idea, well also steal the name for this function. Well call it ckyorn() as in check for
y or n. We can imagine dening a whole ock of these kinds of functions to check for numbers, check for
dates, check for valid directories or le names.
The Contract. In this case, were going to dene a function that has a proper return value that will always
be either "Y" or "N". A request for help ("?") is handled automatically inside this function. A request
to quit is treated as an exception, and leaves the normal execution ow. This function will accept "Q" or
end-of-le (via Ctrl-D; Ctrl-Z Enter on Windows) as the quit signal.
Well dene a new UserQuit exception to signal that the user wants to quit. In a longer program, this
exception permits a short-circuit of all further processing, omitting some potentially complex if statements.
If the user enters Q, well raise this exception. What about end-of-le?
We can run a quick experiment to see what exception is produced by the raw_input() function when we
sent it an end-of-le signal. Well show the normally invisible Ctrl-D as ^D.
>>> a=raw_input('test:')
^D
test:Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError
Our input function must transform the built-in EOFError exception into our UserQuit exception. We can
do this by handling the EOFError exception and raising a UserQuit exception.
ckyorn.py
1
2
3
4
5
6
7
8
269
a= ""
ok= False
while not ok:
try:
a=raw_input( prompt + " [y,n,q,?]: " )
except EOFError:
a= "Q"
raise UserQuit
if a.upper() in [ 'Y', 'N', 'YES', 'NO' ]:
ok= True
elif a.upper() in [ 'Q', 'QUIT' ]:
raise UserQuit
elif a.upper() in [ '?' ]:
print(help)
else:
pass
return a.upper()[:1]
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
3. We dene our own exception, UserQuit. Well use this to signal one of two events: the user entered a
Q, or the user signaled and end-of-le to the operating system.
5. The ckyorn() function does a Check for Y or N. This function has two parameters, prompt and help,
that are used to prompt the user and print help if the user requests it.
11. We establish a loop that will terminate when we have successfully interpreted an answer from the user.
We may get a request for help or perhaps some uninterpretable input from the user. We will continue
our loop until we get something meaningful. The post-condition will be that the variable ok is set to
True and the answer, a is one of ("Y", "y", "N", "n").
12. Within the loop, we surround our raw_input() function with a try suite. This allows us to process
any kind of input, including user inputs that raise exceptions. The most common example is the user
entering the end-of-le character on their keyboard. For GNU/Linux it is Ctrl-D; for Windows it is
Ctrl-Z.
14. We handle EOFError by raising our UserQuit exception. This separates end-of-le on ordinary disk
les elsewhere in the program from this end-of-le generated from the users keyboard. When we get
end-of-le from the user, we need to tidy up and exit the program promptly. When we get end-of-le
from an ordinary disk le, this will require dierent processing.
17. If no exception was raised, we examine the input character to see if we can interpret it.
If the user entered an expected answer, we set ok. The users input is in a, which we can return.
If the user enters Q or QUIT, we treat this exactly like as an end-of-le; we raise the UserQuit exception
so that the program can tidy up and exit in a completely uniform manner.
If the user enters ?, we can provide a help message and prompt for input again.
25. We return a single-character result only for ordinary, valid user inputs. A user request to quit is
considered extraordinary, and we raise an exception for that.
We can use this function as shown in the following example. Heres a line of a script that uses our new
ckyorn().
allDone= ckyorn(
help= "Enter Y if finished entering data",
prompt= "All done?")
Heres the results from running this little script to get a value for allDone .
270
This example shows how we use exceptions to handle unexpected situations that arise. The most common
source for these unexpected situations are the operating system and the human user. In the operating system
case, there are resource limits that may lead to unexpected problems: we might be out of disk space or out
of memory. In the human user case well people are unpredictable.
Tip: Debugging Exception Handling First, we may have the wrong exceptions named in the except
clauses. If we evaluate a statement that raises an exception, but that exception is not named in an except
clause, the exception wont get handled.
Since Python reports the name of the exception, we can use this information to add another except clause,
or add the exception name to an existing except clause. We have to be sure we understand why were
10.1. The Unexpected : The try and except statements
271
getting the exception and we have to be sure that our handler is doing something useful. Exceptions like
RuntimeError, for example, shouldnt be handled: they indicate that something is corrupt in our Python
installation.
You wont know you spelled an exception name wrong until an exception is actually raised and the except
clauses are matched against the exception. The except clauses are merely potential statements. Once an
exception is raised, they are actually evaluated, and any misspelled exception names will cause problems.
Second, we may be raising the wrong exception. If we attempt to raise an exception, but spelled the
exceptions name wrong, well get a strange-looking NameError, not the exception we expected.
As with the except clause, the exception name in a raise clause is not examined until the exceptional
condition occurs and the raise statement is executed. Since raise statements almost always occur inside if,
elif or else suites, the condition has to be met before the raise statement is executed.
Generally, we prefer to minimize our use of the built-in Python exceptions. There are times when an existing
exception clearly captures the nature of the condition. More often, however, our program has a unique
exception, and we should have a uniquely named exception. By using our own exceptions, rather than
Python exceptions, we avoid conating our exceptional conditions with Pythons own internal exceptional
conditions.
Typically, we only dene one or two new exceptions for our own modules. We dont want to dene a
large, complex group of exception classes. The typical approach is to dene our own general-purpose Error
exception in our module.
exception, youll see some additional details on what the real problem is. See External Data and Files
for more information.
IndexError Sequence index out of range. This comes from trying to nd an item beyond the end of a
sequence a tuple, list or string. This is always a design error: we shouldnt try to nd items that
dont actually exist. One of the most common occurrences is trying to nd the rst item of a sequence
with no items at all. See Basic Sequential Collections of Data for more information.
This exception almost always means that an if statement is needed so that something more useful can
be done when the sequence is empty.
KeyError Mapping key not found. This comes from trying to nd a key that doesnt exist in the map. This
is always a design error: we shouldnt try to nd items that dont actually exist. See Mappings : The
dict for more information.
We have a number of solutions: we can x our program to put the element into the map correctly. Or,
we can use the get(), setdefault() or has_key() method functions to determine if the key exists or
to provide a suitable default value when the key doesnt exist.
KeyboardInterrupt Program interrupted by user. This happens when the user hits Ctrl-C. The user wants
to exit from our program. Generally, we should not handle this exception. Its better to let our program
stop running when the user wants it to stop.
MemoryError Out of memory. This may be a design problem in our program, or it may be the users problem
for buying a computer which is too small. If our program consistently runs out of memory, it could be
designed to create too many objects. Almost all algorithms have two variations: one which operates
in less time, and another which uses less memory. These design considerations are beyond the scope
of this book.
OSError OS system call failed. This can happen any time we deal with any operating system resource.
This general exception covers all of the various kinds of problems that can occur. When you print the
exception, youll see some additional details on what the real problem is.
RuntimeError, SystemError Unspecied run-time error or an internal error in the Python interpreter.
When this happens, Python simply cant cope with something. This is rarely the fault of your program.
More likely, youve got some complex problem with your operating system, Python or some add-on
modules. If the problem is consistent, you should consider that you may have more serious problems
with your computer. You may have viruses, spyware or other corrupt les.
SystemExit Request to exit from the interpreter. This exception is raised by the sys.exit() function.
TypeError The types of data dont make sense with the function or operator. This is a more serious design
error. For example, :"2"+3 is an example of a TypeError. If we mean to perform arithmetic, one of
the values needs to be converted to a number. If we mean to concatenate strings, one of the values
needs to be converted to a string.
UnicodeError Unicode related error. This happens when we attempt to process a Unicode string that isnt
properly encoded. This often happens when reading Unicode data from les or other network sources.
In this respect, it is like an IOError exception, and should be handled similarly.
ValueError A function was given an inappropriate argument value of a valid data type. The most notable
example is attempting to take the square root of a negative number. Because you provided a number,
the data type is valid. However the value of the data was not valid.
Compare math.sqrt(-1) with math.sqrt("Hello Dolly"). The rst is sometimes reported a
ValueError because the type is right, by the values range is inappropriate. The second is a TypeError.
ZeroDivisionError The second argument to a division or modulo operation was zero. This is a design
error, also. It is easy to check for this situation in an if statement and do something more useful than
raise an exception.
273
The following exceptions are more typically returned at compile time before your program can even
execute. These errors indicate an extremely serious error in the basic construction of your program. While
these exceptional conditions are a necessary part of the Python implementation, theres little reason for a
program to handle these errors.
ImportError Import cant nd the module, or cant nd a requested name within the module.
IndentationError Improper indentation.
NameError Name not found either locally (inside the function) or globally.
NotImplementedError Method or function hasnt been implemented yet.
SyntaxError Invalid syntax.
TabError Improper mixture of spaces and tabs.
UnboundLocalError Local name referenced but not bound to a value.
The following exceptions are the internal denitions on which Exception objects are based. Normally, these
never occur directly. You would use these when designing a new exception of your own.
Exception Common base class for all exceptions.
StandardError Base class for all standard Python exceptions.
ArithmeticError Base class for arithmetic errors.
EnvironmentError Base class for I/O related errors.
LookupError Base class for lookup errors.
There are a number of common character-mode input operations that can benet from using exceptions to
simplify error handling. All of these input operations are based around a loop that examines the results of
raw_input and converts this to expected Python data.
All of these functions should accept a prompt, a default value and a help text. Some of these have additional
parameters to qualify the list of valid responses.
All of these functions construct a prompt of the form: [,?,q]:
your prompt
If the user enters a ?, the help text is displayed. If the user enters a q, an exception is raised that indicates
that the user quit. Similarly, if the KeyboardInterrupt or any end-of-le exception is received, a user quit
exception is raised from the exception handler.
Most of these functions have a similar algorithm.
General User Input Function Algorithm
1. Construct Prompt. Construct the prompt with the hints for valid values, plus ? and q.
2. While Not Valid Input. Loop until the user enters valid input.
Try the following suite of operations.
274
Prompt and Read. Use raw_input to prompt and read a reply from the user.
Help? If the user entered ?, provide the help message.
Quit? If the user entered q or Q, raise a UserQuit exception.
Try the following suite of operations
Convert. Attempt any conversion.
Range Check. If necessary, do any range checks. For some prompts, there will be
a xed list of valid answers. For other prompts, there is no checking required.
If the input is valid, break out of the loop.
In the event of an exception, the user input was invalid.
Nothing?. If the user entered nothing, and there is a default value, return the default
value.
3. Result. Return the validated user input.
In the event of an exception, this function should generally raise a UserQuit exception.
Exercises
1. ckdate
Prompts for and validates a date. The basic version can require dates have a specic format, for
example mm/dd/yy. A more advanced version can accept a string to specify the format for the input.
Much of this date validation is available in the time module, which will be covered in Time and Date
Processing : The time and datetime Modules. This ckdate() function must not return bad dates or
other invalid input.
2. ckint
Display a prompt; verify and return an integer value. This version has no range checking, that is done
by a separate function that gets an integer value in a given range.
3. ckitem
Build a menu; prompt for and return an item from the menu of choices. A menu is a numbered list
of values, the user selects a value by entering the number. The function should accept a sequence of
valid values, generate the numbers and return the actual menu item string. An additional help prompt
of "??" should be accepted, in addition to writing the help message, this additional help will also
redisplay the menu of choices.
4. ckkeywd
Prompts for and validates a keyword from a list of keywords. This is similar to the menu, but the
prompt is simply the list of keywords without numbers being added.
5. ckpath
Display a prompt; verify and return a pathname. An advanced version can use the os.path module
for information on construction of valid paths. This should check the user input to conrm that the
path actually exists. See Modules : The unit of software packaging and assembly for more information.
6. ckrange
Prompts for and validates an integer in a given range. The range is given as separate values for the
lowest allowed and highest allowed value. If either is not given, then that limit doesnt apply. For
instance, if only a lowest value is given, the valid input is greater than or equal to the lowest value. If
only a highest value is given, the input must be less than or equal to the highest value.
10.1. The Unexpected : The try and except statements
275
7. ckstr
Display a prompt; verify and return a string answer. This is similar to the basic raw_input(), except
that it provides a simple help feature and raises exceptions when the user wants to quit.
8. cktime
Display a prompt; verify and return a time of day. This is similar to ckdate(); a more advanced
version would use the time module to validate inputs. The basic version can simply accept a hh:mm:ss
time string and validate it as a legal time.
9. ckyorn
Prompts for and validates yes/no. This is similar to ckkeywd, except that it tolerates a number of
variations on yes (YES, y, Y) and a number of variations on no (NO, n, N). It returns the canonical forms:
Y or N irrespective of the input actually given.
the problem; the programmer made no eort to determine the actual cause or remediation for the
exception.
In their defense, exceptions can simplify complex nested if statements. They can provide a clear
escape from complex logic when an exceptional condition makes all of the complexity moot. Exceptions should be used sparingly, and only when they clarify or simplify exposition of the algorithm.
A programmer should not expect the reader to search all over the program source for the relevant
exception-handling clause.
For example, the quadratic equation example we have been using for this chapter can create two
exceptions, each of which is much more easily and clearly checked with simple if statements.
277
The Iterator
The iterator does any initial calculations similar to
the way a function is evaluated. A yield statement
provides an initial value.
The iterator is resumed where it left o; it yields the
next value. Being resumed right after the yield
statement is the unique feature of an iterator.
The iterator suite ends or executes return. This
raises a StopIteration exception to indicate there
are no more values.
The iterator may never get a chance to nish
processing normally.
Looking ahead, the names of these clauses point toward the names of the method functions of iterator
objects. This is not something well dwell on here, but thats how we chose those names for the clauses in
the contract.
Sources of Iterators. A sequence object is the most common source for an iterator. When we say something
as simple as the following, we are asking a list object to secretly hand o an iterator to the for statement.
In this example, the list [1, 2, 3, 4, 5] gives a hidden iterator object the for statement.
for i in [1, 2, 3, 4, 5]:
print(i)
Heres the secret hand-o that happened under the hood. This little example will show you that an iterator
object is created. In the next section well see how to make use of that iterator object.
>>> iter( [1,2,3,4,5] )
<listiterator object at 0x70ef0>
We can look at our contract and see what happens under the hood when a for statement uses the list
[1,2,3,4,5].
The for statement implicitly uses the iter() function to request an iterator from the list [1,2,3,4,5]
and saves this iterator in a private variable somewhere.
Conceptually: forIter= iter( [1,2,3,4,5] ).
The for statement calls the iterators next() method; the iterator yields the individual items in the
list so that the for statement can execute the suite.
Conceptually: i= next(forIter).
When the iterator runs out of values it raises an exception. The for statement handles this exception
and nishes normally.
Conceptually, theres a try block that quietly handles the StopIteration exception.
Explicit Iterators. In addition to Pythons implicit use of iterators, we can explicitly ask for an iterator
object. We can then manipulate that iterator to do more sophisticated processing on the underlying sequence.
Lets look at this for statement.
total= 0
for j in range(1,21,2):
total += j
print(total)
278
Here is the equivalent program, written with an explicit iterator object and a while statement. From this,
we can see precisely what the for statement does for us.
1
2
3
4
5
6
7
8
9
10
total= 0
try:
forIter= iter( range(1,21,2) )
while True:
j= next(forIter)
# The original suite
total += j
except StopIteration:
pass
print(total)
3. Initially, we get the iterator object and save it in a local variable, forIter.
5. We get the next value from the iterator object. We execute the suite of statements that are the suite
of statements in the for statement.
8. When the iterator raises StopIteration, there are no more values to process.
The iter() Function. Here is the formal denition for the iter() function which exposes the iterator
object to us. This is what the for statement uses under the hood to get the iterator for a sequence.
iter(sequence) iterator
Returns an iterator object from the given sequence.
Well see that a variety of collections have a head-tail structure. There is a header (usually a xed number
of items) and a tail that comprises all the rest of the items. When we create a spreadsheet, for example, we
often have a xed number of rows of column titles and an indenite number of rows of data.
Tip: Debugging Iterators
There are several common problems with using an explicit iterator.
Skipping items without processing them.
Processing the same item twice
Getting a StopIteration exception raised when trying to skip the rst item.
10.2. Looping Back : Iterators, the for statement and Generators
279
Generally, the best way to debug a generator is to use it in a very simple iteration statement that prints the
result of the iteration. Printing the items will show us precisely what is happening. We can always change
the print statement into a comment, but putting a # in front of print.
Heres a good design pattern for skipping the rst item in a sequence.
i = iter( someSequence )
next(i)
Skips an item on purpose
while True:
a= next(i)
some processing
print(a)
Skipping items happens when we ask for the next() method of the iterator one too many times.
Processing an item twice happens when we forget to ask for the next() method of the iterator. We see it
happen when a program picks o the header items, but fails to advance to the next item before processing
the body.
Another common problem is getting a StopIteration exception raised when trying to skip the header item
from a list or the header line from a le. In this case, the le or list was empty, and there was no header.
Often, our programs need the following kind of try block to handle an empty le gracefully.
i = iter( someSequence )
try:
next(i) Skips an item on purpse
except StopIteration:
No Items -- this is a valid situation, not an error
Yes. It looks the same. Heres the small, but profound dierence:
xrange() is an iterator.
range() creates a list from which an iterator is then created.
This sets the stage for us writing functions that are more like xrange() instead of range().
280
You can think of the range() function as having a denition like the following. The range() function result
is the list created by iterating through the the xrange() generator.
def range(start,stop,step):
return list( xrange(start,stop,step) )
Heres the formal denition for the xrange(). It looks a lot like range().
xrange([start ], stop[, step ]) generator
Returns a generator (also known as a generator iterator) that yields the same list of values that the
range() function would return. However, since this is a generator, a list is not actually created in
advance, making this faster and more memory ecient.
Important: Python 3
There will be a slight change in Python 3.
The xrange() function is actually much more useful than range(). xrange() is so much more useful that
than the Python 3 range() function will be an iterator (exactly like Python 2 xrange()).
The Python 2 range() function which creates a list object will be removed from Python 3.
Why? We rarely want the actual list object from the range() function. Its far more common to want the
iterator. The few times we need the list object, we can use the list() factory function to build the list.
someList = list( range(1, 21, 2) )
281
def
name
suite
The name is the name by which the generator function is known. It must be a legal Python name; the rules
are the same for function names as they are for variable names. The name must begin with a letter (or _)
and can have any number of letters, digits or _. See Python Name Rules.
Each parameter is a variable name; these names are the local variables to which actual argument values will
be assigned when the function is applied. We dont type the [ and ]s; they show us that the list of names is
optional. We dont type the ...; it shows us that any number of names can be provided. Also, the , shows
that when there is more than one name, the names are separated by ,s.
The suite (which must be indented) is a block of statements that must include a yield statement to yield
values for the generator. Any statements may be in this suite, including nested function denitions.
As with functions, the rst line of a generator is expected to be a document string (generally a triple-quoted
string) that provides a basic description of the function. See Functions Style Notes.
yield Syntax. The yield statement provides each value to the for statement. ,
yield
expression [
... ]
The for statement must have the proper number of variables to match the number of expressions in the
yield statement.
The presence of the yield statement in a function body means that the function is actually a generator
object. The generator will have the complete interface necessary to work with the for statement.
A return statement can be used to end the iteration. If used, the return statement doesnt return anything,
and cannot have an expression. In a generator, the return statement raises the StopIteration exception
to signal to the for statement that we are nished.
A Goofy Example. Heres an example that uses a sequence of yield statements to yield a xed sequence
of values. While not terribly practical, this shows how the yield statement fullls the iterator contract with
a for statement.
def primeList():
yield 2
yield 3
yield 5
yield 7
yield 11
yield 13
After dening this generator, heres what we see when we use it. This behaves as if we had the list [2,
3, 5, 7, 11, 13]. For a small list like this, the dierence is invisible. However, for very large lists, the
generator doesnt use as much memory.
>>> for i in primeList():
...
print(i)
...
2
3
5
7
11
13
Generate Craps Dice Rolls. Heres a slightly more sophisticated generator that yields a sequence of dice
throws ending with a seven or the desired point. This generator creates pairs of random dice. If the pair of
282
dice are 7 or the point, the generator yields the nal roll to the for statement, and then nishes.
While the pair of dice is not 7 and not the point, then the pair will be yielded to the for statement. Each
iteration of the for statement suite will generate the next pair of dice.
import random
def genToPoint( point=None ):
d1,d2= random.randrange(1,7),random.randrange(1,7)
while d1+d2 != 7 and d1+d2 != point:
yield d1, d2
d1,d2= random.randrange(1,7),random.randrange(1,7)
assert d1+d2 == 7 or d1+d2 == point
yield d1,d2
Here are two examples of using this generator function in a for statement. Since these examples depend on
random numbers, your mileage may vary. In the rst case, the generator yielded three spins, ending with a
7. In the second case, it yielded four spins before stopping. In both cases, we nished with Craps.
>>>
...
...
(3,
(3,
(4,
(4,
(1,
for r in genToPoint(10):
print(r)
3)
3)
1)
1)
6)
283
We may have a Reduce problem where were summarizing (or reducing) a larger sequence into
a summarized sequences. A sequence of stock transaction details that must be added up to create a
sequence subtotals, for example.
Head-Tail. In the Head-Tail pattern, we have one or more items which are a preamble or heading. The
most common example of this is data that we get from spreadsheets with column titles. We may, for example,
want to download stock quotes from the internet; these les often have column titles or other preambles in
front of the real data. Generally, the preamble is of a xed size, and we can look at sample data to see how
many lines of column titles need to be skipped.
The solution to the head-tail problem uses an explicit iterator. These solutions have this general pattern:
iterator = iter( sequence )
# Consume the a heading item.
head = next(iterator)
# Process the tail items.
for variable in iterator :
process a tail item
Sometimes, the heading is more complex than a xed number of lines. In this case, we may have to do more
sophisticated processing to skip the header. For example, the header may end with an item that has a long
string of -s.
In this case, we may want to use an explicit iterator object. If we provide a sequence to the for statement,
it will request an iterator from the sequence. If, on the other hand, we provide the for statement with an
explicit iterator, the for statement wont reinitialize it.
myFakeData= """some title
some other title
---------------the real data
more real data"""
# Create an iterator
myIter= iter( myFakeData.splitlines() )
# Find the last line of the header
for line in myIter:
if line.startswith("--"):
break
# Process the data after the header
for data in myIter:
print(data)
284
generator ( parameters ):
Create an empty accumulator
for i in some sequence :
Look at item i
if not part of the current group? :
yield the accumulated response
Create a new, empty accumulator
Accumulate i in the accumulator
yield the final accumulated response
There is a common variation on this theme. This combines the head-tail pattern with the look-ahead
pattern. This is slightly less desirable because there are two copies of the Seed the group accumulator with i
statement(s). However, for some kinds of complex processing, this may be dicult to avoid.
def
generator ( parameters ):
iterator = iter( some sequence )
Skip the heading
i = iterator .next()
Seed a new accumulator with i
for i in iterator :
Look at item i
if part of the group? :
Accumulate i in the accumulator
else:
yield the accumulated response
Seed a new accumulator with i
yield the final accumulated response
12:46:50,109
12:46:50,109
12:46:50,125
12:57:14,046
12:57:18,875
12:57:19,625
INFO
INFO
INFO
INFO
INFO
INFO
We can process this more conveniently if we change each complete message into a tuple of lines. This makes
285
multi-line messages (like the very rst one in the log) and single-line messages (like the remaining lines)
similar enough that processing is much easier.
Wed like to rearrange this text into a list of tuples. Each tuple is a complete log message.
Item 0 of a message tuple should be the decomposition of the message header line. We can break it
down into a 5-tuple with the time stamp, the severity, the process name, the user name and all of the
following text. The time stamp is, itself, can be 7-tuple of year, month, day, hour, minute, second and
millisecond.
The remaining items of a message (if any) are simply additional lines of text from the message in the
log le.
Wed like the rst message in the log, which occupies the rst 5 lines, to become the following Python list.
Item 0 is a tuple which describes the header line, items 1 to 4 are the extra text after that header line. When
we look at item 0 of the tuple, we see that it is a tuple with the time stamp, the severity, the process name
and two empty strings for the missing items on the rst line. The time stamp is also a tuple.
[ ( (2003,7,28,12,46,42,843), 'INFO', 'main', '', ''),
"-----------------------------------------------------------------------------",
"XYZ Management Console initialized at: Mon Jul 28 12:46:42 EDT 2003",
"Package Build: 452",
"-----------------------------------------------------------------------------" ]
Wed like the second message in the log, which occupies the next 1 line, to become the following Python list.
Item 0 is a tuple which describes the header line, and this is the only item in the list. When we look at item
0 of the tuple, we see that it is a tuple with the time stamp, the severity, the process name and an empty
strings for the empty [] which would normally have a username. The time stamp is also a tuple.
[ ( (2003,7,28,12,46,50,109), 'INFO', 'main', '',
'Export directory does not exist' ) ]
The Goal. Once weve done this transformation from the original text to these Python structures, we can
then easily scan the log for interesting messages. Item 0 of each message is the header tuple, item 1 of this
header tuple is the severity. If the log is transformed to a list, this processing can be a simple lter. Our
goal, then, is to use simple lters to nd interesting log messages.
Assume our log is transformed into a variable named logList and we want to see all messages where the
severity is ERROR. A lter that keeps just the headers of these messages could look like this.
[ message[0] for message in logList if message[0][1] == 'ERROR' ]
Generator Design. Heres the start of a generator which will collect a message and all of the following
lines into a tuple of strings. This has the basic pattern of a look-ahead generator. Well accumulate a
complete message by looking ahead to the rst line of the next message. This rst line of the next message
is our look-ahead. We can yield the previous message, and then reset our processing to begin with this rst
line.
logScanGenerator.py
1
2
3
4
5
6
7
8
286
9
10
11
12
13
14
15
16
2
3
4
5
6
7
8
9
10
11
12
13
287
else:
count += 1
yield count
14
15
16
3. The spins variable denes our sample data. This might be an actual record of spins, or it could be
created by another program.
8. We dene our countReds() generator. This generator initializes count to show the number of nonblacks before a black. It then steps through the individual spins, in the order presented. For nonblacks, the count is incremented.
11. For black spins, however, we yield the length of the gap between the last black. This value is given to
the for statement to be processed.
When for statements suite is done, it will resume the generator right after the yield statement: the
count will be reset, and the for loop will advance to examine the next number in the sequence.
16. When the sequence is exhausted, we also yield the nal count. This last gap counts may have to be
discarded for certain kinds of statistical analysis because it doesnt represent an actual black spin.
This program shows how we use the countReds() generator function. In this case, we reduce the values by
accumulating the total of all gaps and the number of gaps. We can then compute the average gap size.
total= 0
count= 0
for gap in countReds(spins):
total += gap
count += 1
print(count,"gaps")
print("average size",total/count)
288
>>> the_tuple = ( 9, 7, 3, 12 )
>>> for v in reversed( the_tuple ):
...
print(v)
...
12
3
7
9
Write a generator named punctuation() which will examine each character of a string, separating it
into sequences of characters and individual punctuation marks. A simple version can separate the line
into two kinds of tokens:
Punctuation marks (especially -, :, , and space).
Sequences of digits and sequences of letters.
Youll be following the look-ahead design to accumulate sequences of digits and letters until your
next character is a punctuation mark; then youll yield the sequence of digits or letters you found. You
can then yield the punctuation mark. Then you can reset your accumulator to be an empty string.
10.2. Looping Back : Iterators, the for statement and Generators
289
The pattern match can be a function that is something like the following:
def isHeader( aLine ):
sequence = [ token for token in punctuation( aLine ) ]
return (sequence[1] == '-' and sequence[3] == '-' and sequence[5] == ' '
and sequence[7] == ':' and sequence[9] == ':' and sequence[11] == ',')
The nal step is to rewrite the logScan() function to use isHeader() instead of line[:4] == "2003".
In Text Processing and Pattern Matching : The re Module well look at the re moodule that can
improve how this parser works.
3. Two For One.
The pattern matching function, isHeader() broke down a header into individual data elements, including all of the punctuation marks and all of the words and digits. Once we recognized a header, we
then parsed the header line a second time to break out the various elds. Cant we do both operations
at once?
Take a closer look at the isHeader() function. Does it have to return True, or can it return any value
that is equivalent to true? What if it returned the nested tuples of ( (year, month, day, hour,
minute, second, millisecond), severity, thread, user, string) for headers?
This would mean that the rst 25 or so tokens would participate in this parsing. The remaining tokens
were simply the text at the end of the message line. We can transform the tail end of the line from
a sequence of strings to a single string with something line "".join( line[25:] ). This will begin
with the 26th item (in position 25) and recreate the original string from this sequence.
Rewrite isHeader() to return a proper tuple for headers and False for lines that cant be recognized
as a header.
The nal step is to rewrite the logScan() function to use this revised isHeader() function.
4. Improve Generator Eciency.
In Geeky Generator Example: Web Server Logs we have a serious performance problem. Look at the
if currentMessage: statement on the 7th line of the example. This condition is always true except
the very rst time through this loop. How can we avoid that needless condition checking?
Rather than ask if we have found a valid header line, we should assure that we have already seen the
valid header line. Well need to change the initialization of our loop to set currentMessage to a valid
message header instead of an empty list.
We will need to rewrite this generator in three steps:
5. Use the head-tail design and get an explicit iterator for individual lines.
In the example, we used the splitline() method of a string (or the readline() method of a le),
and gave this sequence to a for statement. Instead of this, we need to get the iterator for the sequence
of lines using the iter() function. Once we have the iterator, we can locate the rst valid line.
6. Find the rst valid header and initialize currentMessage.
We are going to use the iterators next() function to get lines, looking for one that matches the header
line pattern. In the situation where we cant initialize currentMessage, we have an empty log le and
were done without yielding anything.
7. Simplify the for loop.
Now that currentMessage is initialized, we can have a proper loop that examines the next line to see if
it is a header. If so, yield the currentMessage knowing that it is a valid message, reset currentMessage
to have the next header line. Otherwise, accumulate the line in the current message. The change to
the for statement is minor, we use the iterator instead of the list of lines.
290
We can eliminate the if currentMessage: statement. This processing loop will now run considerably
faster.
In the Exceptions FAQ you said that exceptions were for rare events, but here youre using them for the in
The general statement still stands: exceptions are for rare or exceptional situations. A clutter of
exceptions is more confusing than a well-planned set of ordinary if statements.
Generator functions must use an exception to end the processing loop. This is an example of the outof-band signal design pattern. We need some kind of signal that isnt a piece of data. The alternative
is to dene some particular data value as an end-of-iteration sentinel. Doing this makes that sentinel
value sacred, and limits our exibility. Rather than pick a sentinel value that would, in eect, be an
illegal value for any program, we raise an exception instead.
In the C programming language, the designers elected to use a particular ASCII character as the end-ofstring sentinel value. One consequence of that is that les which contain that specic ASCII character
cannot be processed easily by a C program. Some C-language programs discard that character, other
C-language programs cant read les with that character.
291
292
CHAPTER
ELEVEN
First, well cover a simple un-ordered collection, called a set, in Collecting Items : The set.
In Mappings : The dict, well cover the mapping collection, called a dictionary. This type of collection maps
a label (or key) to a value.
We can show how Python uses dictionaries and sequences to handle arbitrary lists of parameters to functions
in Dening More Flexible Functions with Mappings.
293
Mutability. We have to look at two aspects of mutability. The items in a set must be immutable: strings,
numbers and tuples. We cant easily create a set which contains a bunch of list objects.
Why not?
Lets say we had two lists:
list_one = [1, 2]
list_two = [1]
Assume, for the moment, that we could somehow create a set from these two lists.
What happens when we do this?
list_two.append( 2 )
Oops. Now we have two lists in the set which appear identical.
This clearly must be forbidden. Thats one aspect of mutability: all items in a set must be immutable.
The second aspect of ummutability reects sets themselves. There are two avors of sets: set and frozenset.
The ordinary set is mutable, in the same way that a list is mutable. A frozenset, on the other hand, is
immutable, more like a tuple.
As with tuples, we can create a new, larger frozenset from the union of two other frozensets. The original
sets doent change, but we can use them to create a new set.
b This is a set of Fibonacci numbers. The value 1 is duplicated on the input sequence. The
set cant have duplicates, so the resulting set value will be set([1,2,3,5,8,13]).
prime This is a set of prime numbers. There are no duplicates in the input sequence, so the set
has the same number of elements.
words This is a set of distinct words extracted from the phrase. The len(_.split()) is 16.
Then len(words) is 14. If you check carefully, youll see that the strings 'to' and 'the'
are duplicated in the input sequence.
294
craps This is a set of pairs of dice. On the rst roll of a Craps game, if the shooter rolls any of
these combinations, totalling 2, 3 or 12, the game is over, and the shooter has lost. Each
element in the set is a 2-tuple made up of the two individual dice.
Tip: Debugging set()
A common mistake is to do something like set( 1, 2, 3 ), which passes three values to the set() function.
If you get a TypeError: set expected at most 1 arguments, got n, you didnt provide proper tuple to
the set factory function.
Another interesting problem is the dierence between set( ("word",) ) and set( "word" ).
The rst example provides a 1-element sequence, ("word,"), to set(), which becomes a 1-element
set.
The second example passes a 4-character string, "word", which becomes a 4-element set.
In the case of creating sets from strings, theres no error message. The question is really what did you
mean? Did you intend to put the entire string into the set? Or did you intend to break the string down to
individual characters, and put each character into the set?
| prime
2, 3, 5, 7, 8, 11, 13])
| words
2, 3, 5, 8, 'is', 'men', 13, 'good', 'aid', 'now', 'come', 'to', 'for', 'all', 'of', 'their', 'time', 'part
In the rst example, we created a union the b set and the prime set. In the second example, we computed
a fairly silly union that includes the b set and the words set; since one set has numbers and the other set
has strings, its not clear what we would do with this strange collection of unrelated things.
The union operator can also be written using method function notation.
295
Note that the two results of fib | words and words.union(fib) have the same elements in a dierent
order. We can assure that this is true with something like the following:
>>> fib | words == words.union(fib)
True
>>> fib | words == words | fib
True
The above two expressions show us that the essential mathematical rules are true, even if the order of the
elements is sometimes dierent.
The & operator. The & operator computes the intersection of two sets; it computes a new set which has
only the elements which are common to the two sets which are being intersected. In essence, an element is
a member of s1 & s2 if it is a member of s1 and a member of s2.
Heres the Venn diagram that uses shading to show the elements which are in the intersection of two sets.
Here are some examples.
>>> fib & prime
set([2, 3, 5, 13])
>>> fib & words
set([])
In the rst example, we created an intersection of the b set and the prime set. In the second example, we
computed a fairly silly intersection that shows that there are no common elements between the b set and
the words set.
296
The - operator. The - operator computes the dierence between two sets; it computes a new set which
starts with elements from the left-hand set and then removes all the matching elements from the right-hand
set. It ts well with the usual sense of subtraction. In essence, an element is a member of s1 - s2 if it is a
member of s1 and not a member of s2.
Heres the Venn diagram that uses shading to show the elements which are in the dierence, s1-s2.
Here are some examples.
>>> fib-prime
set([8, 1])
>>> prime-fib
set([11, 7])
>>> fib-words
set([1, 2, 3, 5, 8, 13])
In the rst example, we found the elements which are in the b set, but not in the prime set. We can think of
this as starting with the b set and removing all the values that are in the prime set. In the second example,
we found the elements which are in the prime set, but not in the b set.
The third example shows the b set with the word set removed. In this case, its still the same b set. We
can prove this evaluating fib-words == fib.
The dierence operator can also be written using method function notation.
297
The ^ operator. The ^ operator computes the symmetric dierence between two sets; it computes a new
set which elements that are in one or the other, but not both. Since a union is elements which are in one
set or the other, and an intersection is elements which are in both, the symmetric dierence of two sets is
(s1|s2)-(s1&s2). Rather than have to write this out, we have a pleasant short-hand operator.
Heres the Venn diagram that uses shading to show the elements which are in the symmetric dierence of
two sets.
Here are some examples.
>>> fib^prime
set([1, 7, 8, 11])
>>> fib^words
set([1, 'all', 'good', 5, 'for', 'to', 8, 'of', 'is', 'men', 2, 13, 'their', 3, 'time', 'party', 'the', 'now', 'com
In the rst example, we found the elements which are in the b set or the prime set, but not both. In eect,
a union is computed and the common elements removed from that union. In the second example, we found
the elements which are in the b set or the words set, but not both. In this case, there are no common
elements, so the symmetric dierence is the same as the union.
The symmetric dierence operator can also be written using method function notation.
>>> prime.symmetric_difference( fib )
set([1, 7, 8, 11])
>>> prime.symmetric_difference( fib ) == prime ^ fib
True
>>> prime ^ fib == (prime|fib)-(prime&fib)
298
True
299
The set comparisons are equality and subset comparisons. Therefore, s1 <= s2 asks if set s1 is a subset of
s2. The == and != operations do what youd expect, comparing to see if the two sets have the same collection
of elements.
>>> diff = prime & fib
>>> diff <= prime
True
>>> diff <= fib
True
In this example, we computed the intersection of prime and b, which was the small set of numbers common
to both sets, set([2, 3, 5, 13]). This set, by denition, has to be a subset of both of the original sets.
As with other set operators, we also have method function notation for these operations.
>>> "to" in words
True
>>> diff.issubset( prime )
True
>>> prime.issuperset( diff )
True
>>> (1,2) in craps
True
set.intersection(s2) set
Returns a new set which is the intersection of the elements of s1 and s2. This is only the common
elements to both sets. This can also be written s1&s2.
>>> set( [ "now", "is" ] ).intersection( set( [ "is", "the" ] ) )
set(['is'])
set.difference(s2) set
Returns a new set which has only the elements from s1 that are not also elements of s2. The new set
is eectively a copy of s1 with elements from s2 removed. This can also be written s1-s2.
>>> set( [ "now", "is" ] ).difference( set( [ "is", "the" ] ) )
set(['now'])
300
set.symmetric_difference(s2) set
Returns a new set which has elements that are unique to s1 and s2. The new set is eectively the
union of s1 and s2 with the intersection elements removed. This can also be written as s1^s2.
>>> set( [ "now", "is" ] ).symmetric_difference( set( [ "is", "the" ] ) )
set(['now', 'the'])
Accessors. These method functions comparison operators. They apply a comparison between two sets and
create a boolean value.
class set
set.issubset(s2) boolean
Returns True if s1 is a subset of s2. To be a subset, all elements of s1 must be present in s2. This can
also be written as s1 <= s2.
>>> set( [ "now", "is" ] ).issubset( set( [ "is", "now", "the" ] ) )
True
set.issuperset(s2) boolean
Returns True if s1 is a superset of s2. To be a superset, all elements of s2 must be present in s1. This
can also be written as s1 >= s2.
>>> set( [ "now", "is" ] ).issuperset( set( [ "is", "now", "the" ] ) )
False
Manipulators. This next group of methods manipulate a set by adding or removing individual elements.
These operations do not apply to a frozenset.
class set
set.add(object)
Adds the given object to set s1. If the object did not previously exist in the set, it is added. If the
object was already present in the set, the s1 doesnt change.
>>> craps=set()
>>> craps.add( (1,1)
>>> craps.add( (6,6)
>>> craps.add( (1,2)
>>> craps.add( (2,1)
>>> craps
set([(1, 2), (1, 1),
)
)
)
)
(2, 1), (6, 6)])
set.remove(object)
Removes the given object from the set s1. If the object did not exist in the set, an KeyError exception
is raised.
>>> colors= set( [ "red", "black", "green" ] )
>>> colors.remove( "green" )
>>> colors
set(['black', 'red'])
set.pop() object
Removes an object from set s1, and returns it. Since there is no dened ordering to a set, any object
is eligible to be removed. If the set is already empty, a KeyError is raised.
>>> colors= set( [ "red", "black", "green" ] )
>>> while len(colors):
...
print(colors.pop())
...
green
301
black
red
>>> colors
set([])
set.clear()
Removes all objects from the set. After this method, the set is empty.
>>> colors= set( [ "red", "black", "green" ] )
>>> colors
set(['green', 'black', 'red'])
>>> colors.clear()
>>> colors
set([])
Updates. The following group of methods update a set using another set of elements. Each of these
method functions parallels the operator method functions, shown above.
There is a signicant dierence, however. These methods actually mutate the set object to which they are
attached. Each of these functions is available as an augmented assignment operator, which emphasizes the
change to an set.
class set
set.update(s2)
Adds all the elements of set s2 to set s1. This can also be written as s1 |= s2.
>>> two=set( [ (1,1) ] )
>>> three=set( [ (2,1), (1,2)] )
>>> twelve=set( [ (6,6) ] )
>>> craps=set()
>>> craps.update( two )
>>> craps.update( three )
>>> craps.update( twelve )
>>> craps
set([(1, 2), (1, 1), (2, 1), (6, 6)])
set.intersection_update(s2)
Updates s1 so that it is the intersection of s1&s2. In eect, this removes elements from s1 which are
not also found in s2. This can also be written as s1 &= s2.
>>> ph1="now is the time for all good men to come to the aid of their party"
>>> words=set( ph1.split() )
>>> words
set(['party', 'all', 'good', 'for', 'their', 'of', 'is', 'men', 'to', 'time', 'aid', 'the', 'now', 'come'])
>>> ph2="the quick brown fox jumped over the lazy dog"
>>> words2=set( ph2.split() )
>>> words2
set(['brown', 'lazy', 'jumped', 'over', 'fox', 'dog', 'quick', 'the'])
>>> words.intersection_update(words2)
>>> words
set(['the'])
set.difference_update(s2)
Updates s1 by removing all elements which are found in s2. This can also be written as s1 -= s2.
>>> ph1="now is the time for all good men to come to the aid of their party"
>>> words=set( ph1.split() )
>>> words
set(['party', 'all', 'for', 'their', 'of', 'time', 'aid', 'now', 'come'])
302
>>> ph2="to do good to men unthankful is to cast water into the sea"
>>> words2=set( ph2.split() )
>>> words2
set(['do', 'good', 'cast', 'is', 'men', 'the', 'water', 'to', 'sea', 'unthankful', 'into'])
>>> words.difference_update(words2)
>>> words
set(['party', 'all', 'for', 'their', 'of', 'time', 'aid', 'now', 'come'])
In this example, weve created a set, b, of the rst seven Fibonacci numbers. We also created a set, prime,
of the rst six prime numbers. Our for statement rst computes the intersection of these two sets, then sets
n to each value in that intersection.
303
>>> craps= set([(1, 2), (1, 1), (2, 1), (6, 6)])
>>> len(craps)
4
max(iterable) value
Returns the largest value in sequence.
>>> craps= set([(1, 2), (1, 1), (2, 1), (6, 6)])
>>> max(craps)
(6, 6)
Recall that tuples are compared element-by-element. The tuple (6, 6) has a rst element that is
greater than all others.
min(sequence) value
Returns the smallest value in sequence.
>>> craps= set([(1, 2), (1, 1), (2, 1), (6, 6)])
>>> min(craps)
(1, 1)
Recall that tuples are compared element-by-element. The tuple (1, 1) has a rst element that is less
than all but one other tuple, (1, 2). If the rst elements are the same, then the second element is
compared.
Iteration Functions. These functions are most commonly used with a for statement to process set items.
enumerate(iterable) iterator
Enumerate the elements of a set, sequence or mapping. This yields a sequence of tuples based on the
original set. Each of the result tuples has two elements: a sequence number and the item from the
original set.
Note that sets do not have a dened ordering, so this can, in principle, yield the elements of the set in
dierent orders. As a practical matter, the ordering doesnt spontaneously change. However, insertion
or removal of an element may appear to change the enumerated set.
This is generally used with a for statement. Heres an example:
>>> craps= set([(1, 2), (1, 1), (2, 1), (6, 6)])
>>> for position, roll in enumerate( craps ):
...
print( position, roll, sum(roll) )
...
0 (1, 2) 3
1 (1, 1) 2
2 (2, 1) 3
3 (6, 6) 12
304
>>> craps
set([(1, 2), (1, 1), (2, 1), (6, 6)])
Weve created an ordered list from the original set: craps is a set; descending is a list in descending
order. Sets have no dened ordering, so creating a list from a set is the only way to impose a specic
order on the elements.
Aggregation Functions. The following functions create an aggregate value from a set.
sum(iterable) number
Sum the values in the iterable (set, sequence, mapping). All of the values must be numeric.
>>> odd_8 = set( range(1,8*2,2) )
>>> sum(odd_8)
64
>>> odd_8
set([1, 3, 5, 7, 9, 11, 13, 15])
all(iterable) boolean
Return True if all values in the iterable (set, sequence, mapping) are equivalent to True.
The all() function is often used with Generator Expression, which is covered in List Construction
Shortcuts.
>>> craps= set([(1, 2), (1, 1), (2, 1), (6, 6)])
>>> hardways = set( (d1,d1) for d1 in range(1,7) )
>>> horn = hardways - craps
>>> horn
set([(3, 3), (4, 4), (5, 5), (2, 2)])
>>> all( 4 <= (d1+d2) <= 10 for d1,d2 in horn )
True
305
7
8
9
10
11
12
13
14
15
16
r1_win.add( (5,6) )
r1_win.add( (6,5) )
r1_lose= set( [(1,1),(6,6),(2,1),(1,2)] )
for d1 in range(1,7):
r1_win.add( (d1,7-d1) )
for d2 in range(1,7):
dice.add( (d1,d2) )
hardways= set( [(2,2),(3,3),(4,4),(5,5)] )
point= dice-r1_win-r1_lose
17
18
19
20
print("winners", r1_win)
print("losers ", r1_lose)
print("points ", point)
21
22
23
2. First, we create a number of empty sets that well use to examine throws of the dice in a Craps game.
The dice set will contain the complete set of all 36 possible outcomes. The r1_win set will contain the
dierent ways we can win on the rst throw; it will have the various ways we can throw 7 or 11. The
r1_lose set will contain the dierent ways we can lose on the rst throw; it will have the various ways
we can throw 2, 3 or 12. The point set is all of the remaining throws, which establish a point. Finally,
the hardways set contains the various points on which the two dice are equal, rolling a value the hard
way.
7. We insert the two ways of rolling 11 into the r1_win set.
9. We insert the ways of rolling 2, 12, and 3 into the r1_lose set.
10. Weve set d1 to all values from 1 to-one-before 7. Therefore, the value of (d1,7-d1) will be one of the
six ways to roll a 7. We add this to the r1_win set.
12. Weve set d1 to all the values from 1 to-one-before 7; independently, weve set d2 to all values from 1
to 6. We put every combination of dice rolls into dice.
306
14. We create a set containing the four point rolls where the two dice are equal and assign this set to the
variable hardways.
15. Finally, we take the complete set of dice, remove the roll 1 wins, remove the roll 1 losers, and assign
this set to the variable point.
Note the two assertions that we make as part of our initialization:
We assert that the dice rolls in hardways are a subset of the dice rolls in points. This is a matter of
denition in Craps, and we need to be sure that the preceding statements actually accomplish this.
We assert that the union of r1_win, r1_lost and point is the entire set of possible dice rolls. This,
also, is a matter of denition, and we need be sure that our initialization procedure has established
the proper conditions.
Once weve built some sets, we can now use the sets to evaluate some dice rolls. We can use this kind of
dice-rolling experiment to evaluate a betting strategy.
set_example.py, Part 2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import random
for i in range(10):
d1=random.randrange(1,7)
d2=random.randrange(1,7)
roll= (d1,d2)
if roll in r1_win:
print(roll, "winner")
elif roll in r1_lose:
print(roll, "loser")
else:
if roll in hardways:
print(roll, "hard point")
else:
print(roll, "point")
1. We import the random module so that we can use the randrange() function to generate random die
rolls.
5. After picking two numbers in the range of 1 to-one-before 7, we assemble the variable roll as the dice
roll.
6. If roll is in the r1_win set, we have a winner on the rst roll.
8. If roll is in the r1_lose set, we have a loser on the rst roll.
11. Otherwise, we have a roll that has established a point. We can check for membership in the hardways
set to see if it was one of the special ways to roll a 4, 6, 8 or 10.
307
The rest of the program can split the text into individual words, create a set from those words and
then display the unique words which occur in the paragraph.
Once you have that working, you can create a set of common English words, including the, a, to,
of, in, on, by, as, and, or, not, be, make, do, etc. The dierence between your
complete set of words and this set of common English words will be the unique or unusual words in
the paragraph.
2. Dice Rolls.
The game of Craps is dened around a large number of sets. The game has two parts: the rst roll
(usually called the come out roll, or point o roll), and the remaining rolls (or point on rolls) of
the game.
On the point-o roll. There are rst-roll winners (all the ways of rolling 7 or 11), rst-roll losers
(all the ways of rolling 2, 3 or 12). All remaining rst-roll dice establish a point.
On the point-on rolls. There are losers (all the ways of rolling 7), winners (all the ways of rolling
the point). All remaining rolls do not resolve the game.
Its very handy to have a list of sets. Each set in the list contains all the ways of rolling that number.
We can create the empty list of sets as follows. This will give you a list, named rolls, that has empty
sets in positions 2 through 12. It also has two empty sets in positions 0 and 1, but these wont be used
for anything.
rolls= []
for n in range(13):
rolls.append( set() )
Once you have the list named rolls, you can then enumerate all 36 dice combinations with a pair of
nest loops like the following:
for d1 in range(1,7):
for d2 in range(1,7):
make a two-tuple (d1,d2)
compute the sum, d1+d2
add to the appropriate set in the rolls list
Once you have the list of sets, you can compute sets which contains all the rolls for a win on the
rst roll and all the rolls which would lose on the rst roll. These are simple union operations, using
elements in the rolls list. Specically, youll have to union rolls[2], rolls[3] and rolls[12] for the
rst roll losers.
308
309
Heres a depiction of a dictionary of 3 items. The keys are string color names, and the values are numeric
color levels. This dictionary is one way to describe a nice midnight-blue. In his case, the keys are all strings
and the values are all numbers.
>>> color= {"RED":51, "GREEN":0, "BLUE":153}
key
RED
GREEN
BLUE
value
51
0
153
Unordered. A simple dict cannot preserve order. This is because it uses a hashing algorithm
to identify a place in the dict for a given key. Every Python object must have a hash value:
a simple distinct number. Objects, like strings or tuples, have hash values which summarize
the string or tuple as a unique numeric value. The built-in function, hash() is used to do this
calculation.
>>> color
{'BLUE': 153, 'GREEN': 0, 'RED': 51}
We created the dictionary in one order, Python shows us the dictionary in a dierent order.
Collection. A dictionary, like the sequences we looked at in Basic Sequential Collections of
Data, is a kind of collection of objects. Since the dictionary as a whole is mutable, items can be
inserted into the dict, found in the dict and removed from the dict.
>>> color['GREEN']= 16
>>> color
{'BLUE': 153, 'GREEN': 16, 'RED': 51}
A dict object has member methods that return a sequence of keys, a sequence of values, or a
sequence of ( key , value ) tuples suitable for use in a for statement.
>>> color.keys()
['BLUE', 'GREEN', 'RED']
>>> color.items()
[('BLUE', 153), ('GREEN', 16), ('RED', 51)]
Immutable Keys. Above, we noted the mutability restriction on the key. The key object must
compute a consistent hash value. This issue of consistency is important. If the key changes, how
can we identify it in the dictionary?
For the immutable built-in types, the hash value is perfectly consistent: numbers, strings, tuples
and frozensets are all good kinds of keys for a dictionary.
310
The mutable types like lists, sets or dictionaries present an obvious diculty if their value
should change.
However, you can freeze a list by making a tuple copy of it; similarly, you can freeze a set
by making a frozenset copy of it.
Space vs. Time. We used the mathematical term map to dene what a function does, back
in Adding New Verbs : The def Statement. When we dene a function, we write an algorithm
which is, in eect, the mapping from the domain values to the range values. In a dictionary,
Python explicitly stores a specic set of domain values and their associated range values.
A function like square root, for example, can map any positive oating-point number to that
numbers square root. We dont store all of the billions of possible oating-point numbers,
instead the math.sqrt() function computes a mapping for each specic argument value using
an algorithm. A function uses less storage, but is rather slow.
In the case of a dictionary, we can associate some specic oating-point numbers with their square
roots. We dont have a completely general algorithm, just a list of keys and their associated values.
This will use a considerable amount of storage, but will be very, very fast.
Other Mappings? We have to emphasize a terminology issue here. Python has provisions for creating a
variety of dierent types of mappings. Only one type of mapping comes built-in; that type is the dict. The
term mapping and dictionary are almost interchangeable.
In Another Mapping :
The defaultdict, well look at another variety of mapping, called
collections.defaultdict. This is a slightly dierent mapping. Its a dictionary, but with some extra
features.
Python 3 will add the collections.OrderedDict class.
wheel This dictionary has eight elements. Most of the elements have a number as their key; one
of the elements has a string as the key. All the elements have a string as the value.
myBoat The myBoat dictionary has three elements. One element has a key of the string "NAME"
and a value of the string "KaDiMa". Another element has a key of the string "LOA" and a
value of the integer 18. The third element has a key of the string "SAILS" and the value of
a list ["main", "jib", "spinnaker"].
theBets The theBets is an empty dictionary.
diceRoll The diceRoll variable is a dictionary with two elements. One element has a key of a
tuple (1,1) and a value of a string, "snake eyes". The other element has a key of a tuple
(6,6) and a value of a string "box cars".
311
Dictionary items and keys do not have to be the same type. Keys must be a type that can produce a
hash value. Since lists, sets and dictionary objects are mutable, they are not permitted as keys. All other
non-mutable types (especially strings and tuples) are legal keys.
Dictionary Factory Function. In addition to literal values, the following function also creates a dictionary
object.
dict(mapping) dictionary
Creates a dictionary from the items in mapping. If the mapping is omitted, an empty dictionary is
created.
dict(sequence) dictionary
Creates a dictionary from the items in sequence. Each item in the sequence must be a two-tuple with
keys and values. For example,
dict( [ ('akey','the value'), ('key2','a value') ] )
This only works when the keys are strings which satisfy the rules for Python variable names.
This example starts by creating an empty dictionary, boat1. We provide a key of "NAME" and a value
of "KaDiMa" , which updates the dictionary. We provide a key of "SAILS" and a value which is a list,
["main","jib","spinnaker"]. We also set the value 15 for the key "LOA"; this turns out to be incorrect,
so we replaced the 15 with an 18.
When we ask for the value of boat1, the dictionary is displayed, showing the key:value pairs. Notice that the
order does not correlate to the order in which we entered keys and values.
312
When we evaluate boat1["NAME"], we see the value that is associated with this key.
When we evaluate boat1["BEAM"], we see that any attempt to access a missing key gives us a KeyError
exception.
Here are some other examples of picking elements out of a dictionary. In this case, we get the list value and
use it in a for statement.
>>> for s in boat1["SAILS"]:
...
print(s)
...
main
jib
spinnaker
The % Operator. The string format operator works between string and dictionary. We prefer to use
str.format(), however.
The string formatting method, str.format() can be applied to a dictionary as well as a sequence.
When this operator was introduced in Sequences of Characters : str and Unicode, the format specications
were applied to values from a sequence. We named the values by providing a simple position number. The
format specication of {0:d} used the rst value, the one at position zero. The format {1:5.2f} used the
value at position one.
When we apply the format specications to values that include dictionary, each format specication can
include a dictionary key to pick a specic item from within the dictionary. We can use the position number
along with the dictionary key in a syntax like {0[LOA]:d}. In this case item zero in the format arguments
is expected to be a dictionary and the key value "LOA" will be the item that gets formatted.
For example:
>>> myBoat= { "NAME": "Red Ranger", "LOA": 42 }
>>> "{0[NAME]}, {0[LOA]:d} feet".format( myBoat )
'Red Ranger, 42 feet'
This will nd myBoat[NAME] and use default formatting; it will nd myBoat[LOA] and use d number formatting.
We can also provide a dictionary as the only argument to the format method. There are two ways to do
this. Well show more of this in A Dictionary of Extra Keyword Values.
The dict() function can build a dictionary from arguments where specic parameter names are provided.
We can use key=value to build a dictionary when the keys are strings that follow Python variable name
rules.
>>> dict( NAME="Red Ranger", LOA=42 )
{'LOA': 42, 'NAME': 'Red Ranger'}
The arguments to all functions follow this rule. Because the arguments to the str.format() method follow
these standard rules, we can provide values in the form of a dictionary.
Consider this example:
>>> "{NAME}, {LOA:d} feet".format( NAME="Red Ranger", LOA=42 )
'Red Ranger, 42 feet'
The arguments to the str.format() method become a dictionary. The keys are used directly in the format
strings.
313
If you want to work with the values, you have to use the values() method of the dictionary.
>>> wheel = { 0:"green", "00": "green",
1:"red", 2:"black", 3:"red",
4:"black", 5:"red", 6:"black" }
>>> "red" in wheel.values()
True
>>> "blue" in wheel.values()
False
class dict
dict.clear()
Remove all items from the dictionary.
>>> freq= { 'red':470, 'green':52, 'black':478 }
>>> freq.clear()
>>> freq
{}
314
wheel= 18*['black']+18*['red']+2*['green']
freq= { }
color= random.choice(wheel)
freq.setdefault( color, 0 )
0
>>> freq[color] += 1
>>> freq
{'red': 1}
dict.update(new)
Merge values from the new dictionary into the original dictionary, adding or replacing as needed. It is
equivalent to the following Python statement. for k in new.keys(): d[k]= new[k].
Accessors. The following accessors determine a fact about a dictionary and return that as a value.
class dict
dict.copy() dictionary
Copy the dictionary to make a new dictionary. This is a shallow copy. All objects in the new dictionary
are references to the objects in the original dictionary.
dict.get(key [, default ]) object
Get the item with the given key, similar to d[key]. If the key is not present, supply default instead.
If no value is given for default, the value None is used.
>>> freq= { 'red':470, 'green':52, 'black':478 }
>>> freq.get('red','N/A')
470
>>> freq.get('white','N/A')
'N/A'
>>> freq['red']
470
>>> freq['white']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'white'
dict.has_key(key)
If there is an entry in the dictionary with the given key, return True, otherwise return False.
This is usually written key in dictionary. The dictionary.has_key( key ) form isnt used very
often.
>>> freq= { 'red':470, 'green':52, 'black':478 }
>>> freq.has_key('red')
True
>>> freq.has_key('white')
False
>>> 'red' in freq
True
dict.items() sequence
Return all of the items in the dictionary as a sequence of ( key , value ) tuples. Note that these are
returned in no particular order.
>>> freq= { 'red':470, 'green':52, 'black':478 }
>>> freq.items()
[('black', 478), ('green', 52), ('red', 470)]
dict.keys() sequence
Return all of the keys in the dictionary as a sequence of keys. Note that these are returned in no
particular order.
315
dict.values() sequence
Return all the values from the dictionary as a sequence. Note that these are returned in no particular
order.
>>> freq= { 'red':470, 'green':52, 'black':478 }
>>> freq.values()
[478, 52, 470]
Notice that the keys are provided in no particular order. The dictionary is optimized for raw speed, and this
means that they keys can be scrambled.
We can use the sorted() function to handle this.
>>>
>>>
...
...
APR
FEB
JAN
JUN
MAR
MAY
Additional for techniques. Additionally, we can use several dictionary method functions to extract a
sequence of values from the dictionary. Well look at items(), keys() and values().
Because of for statement works with multiple assignment, and the items() method function returns a
sequence of tuples, we have a powerful technique for iterating through a dictionary. For example
>>> monDict = { "JAN":1, "FEB":2, "MAR":3, "APR":4, "MAY":5, "JUN":6 }
>>> for name, number in monDict.items():
...
print(number, name)
...
3 MAR
316
2
4
6
1
5
FEB
APR
JUN
JAN
MAY
The items() method of the dictionary named monDict returns a sequence with each entry transformed to
a ( key , value ) tuple. The multiple assignment in the for statement assigns the keys to name and the
values to number as it iterates through each element of the sequence. Note that the values returned bear
little relationship to the order in which the dictionary was created.
The del statement. The del statement removes items from a dictionary. For example
>>> i = { "two":2, "three":3, "quatro":4 }
>>> del i["quatro"]
>>> i
{'two': 2, 'three': 3}
In this example, we removed a key (and its associated value) from a dictionary by specifying which key we
wanted removed.
max(dictionary) value
Returns the greatest key in the dictionary.
>>> diceRoll = { (1,1): "snake eyes", (6,6): "box cars" }
>>> max(diceRoll)
(6, 6)
Since our keys are a variety of types (strings and ints), the max() comparison is somewhat unexpected.
min(dictionary) value
Returns the least key in the dictionary.
>>> diceRoll = { (1,1): "snake eyes", (6,6): "box cars" }
>>> max(diceRoll)
(1, 1)
If you want to apply max() or min() to the values instead of the keys, youll use the values() method. It
would look like this.
>>> a = { 23:'skidoo', 7:'eleven', 4:'ever' }
>>> max(a)
23
>>> max(a.keys())
23
317
>>> max(a.values())
'skidoo'
Generally, functions like sum(), any() and all() dont make a lot of sense when applied to the keys of a
dictionary. You often apply these to the values, however.
Iteration Functions. These functions are most commonly used with a for statement to process dictionary
keys.
enumerate(iterable) iterator
Enumerate the elements of a set, sequence or mapping. This yields a sequence of tuples based on the
original tuple. Each of the result tuples has two elements: a sequence number and the key from the
original dictionary.
Since dictionaries have no guaranteed ordering, this isnt completely sensible.
This is generally used with a for statement. Heres an example:
>>> freq= { 'red':470, 'green':52, 'black':478 }
>>> for position, color in enumerate(freq):
...
print(position, color, freq[color])
...
0 black 478
1 green 52
2 red 470
Note that the order as enumerated is not the order originally entered.
sorted( iterable [,key] [,reverse] ) iterator
This iterates through an iterable object like the keys of a mapping in ascending or descending sorted
order. Unlike a lists sort() method function, this does not update the map, but leaves it alone.
This is generally used with a for statement. Heres an example:
>>> freq= { 'red':470, 'green':52, 'black':478 }
>>> for color in sorted( freq ):
...
print(color, freq[color])
...
black 478
green 52
red 470
Producing output sorted by value is a bit trickier. The keys must be unique, but the values dont have
to be unique. That makes it impossible to determine which key belongs to a value.
What we do to report on a dictionary in value order is to use the list of tuples representation produced
by the items() method.
>>> freq= { 'red':470, 'green':52, 'black':478 }
>>> freq.items()
[('black', 478), ('green', 52), ('red', 470)]
>>> def by_freq( freq_pair ): return freq_pair[1]
...
>>> sorted( freq.items(), key=by_freq )
[('green', 52), ('red', 470), ('black', 478)]
reversed(iterable) iterator
This iterates through an iterable (set, sequence, mapping) in reverse order.
Since dictionaries have no guaranteed ordering, this isnt completely sensible.
This is generally used with a for statement.
318
all(iterable) boolean
Return True if all values in the iterable (set, sequence, mapping) are equivalent to True.
When applied to a mapping, this will test the keys. More often we will use a Generator Expression,
which allows us to apply the all test to the values.
>>> myBoat = { "NAME":"KaDiMa", "LOA":18, "HULL":"mono",
...
"SAILS":["main","jib","spinnaker"] }
>>> all( v is not None for v in myBoat.values() )
True
any(iterable) boolean
Return True if any value in the iterable (set, sequence, mapping) is equivalent to True.
When applied to a mapping, this will test the keys. More often we will use a Generator Expression,
which allows us to apply the any test to the values.
>>> fireSail = { "NAME":None, "LOA":16, "HULL":"catamaran",
...
"SAILS":["main","jib"] }
>>> any( v is None for v in fireSail.values() )
True
Iterate through this sequence, placing each word into a dictionary. The rst time a word is seen, the
frequency should be set to 1. Each time the word is seen again, increment the frequency. The nal
dictionary will be a frequency table.
To alphabetize the frequency table, extract just the keys. A sequence can be sorted (see Flexible
Sequences : The list). This sorted sequence of keys can be used to extract the counts from the
dictionary.
319
2. Stock Reports.
A block of publicly traded stock has a variety of attributes, well look at a few of them. A stock has
a ticker symbol and a company name. Create a simple dictionary with ticker symbols and company
names.
For example:
stockDict = { 'GM': 'General Motors',
'CAT':'Caterpillar', 'EK':"Eastman Kodak" }
Create a simple list of blocks of stock. These could be tuples with ticker symbols, prices, dates and
number of shares. For example:
purchases = [ ( 'GE', 100, '10-sep-2001', 48 ),
( 'CAT', 100, '1-apr-1999', 24 ),
( 'GE', 200, '1-jul-1998', 56 ) ]
Create a purchase history report that computes the full purchase price (shares times dollars) for each
block of stock. Use the full company names in stockDict to look up the full company name. This is
the basic relational database join algorithm between two tables.
The outline of processing looks like this: shares price
for s in purchases:
look up s[0] in
stockDict
compute
print a nice-looking line
Create a second purchase summary that which accumulates total investment by ticker symbol. In the
above sample data, there are two blocks of GE stock. These can be combined by creating a dictionary
where the key is the ticker and the value is a list of blocks that have a common ticker symbol.
The outline of the processing looks like this: blocks[symbol]blocks[symbol]
blocks = {}
for s in purchases:
symbol= s[0]
if symbol in blocks:
Append this block to the list
else:
Create a 1-element list in
A pass through the resulting dictionary can then create a report showing each ticker symbol and all
blocks of stock. The outline of the processing looks like this: shares price
for symbol,blockList in blocks.items():
totalValue= 0
totalShares= 0
for s in blockList:
compute value as
accumulate value in totalValue
accumulate shares in totalShares
print a nice-looking line showing totals
3. Date Decoder.
320
A date of the form 8-MAR-85 includes the name of the month, which must be translated to a number.
Create a dictionary suitable for decoding month names to numbers. Create a function which uses
string operations to split the date into 3 items using the - character. Translate the month, correct
the year to include all of the digits.
The function will accept a date in the dd-MMM-yy format and respond with a tuple of ( y, m, d ).
4. Dice Odds.
There are 36 possible combinations of two dice. A simple pair of loops over range(6)+1 will enumerate
all combinations. The sum of the two dice is more interesting than the actual combination. Create a
dictionary of all combinations, using the sum of the two dice as the key.
Each value in the dictionary should be a list of tuples; each tuple has the value of two dice. The general
outline is something like the following:
d= {}
Loop with d1 from 1 to 6
Loop with d2 from 1 to 6
newTuple = ( d1, d2 ) # create the tuple
oldList = d[ d1+d2 ]
newList = oldList + newTuple
d[ d1+d2 ] = newList
Loop over all values in the dictionary
print the key and the length of the list
321
Great question. The database designers call this a secondary index. We want to have two dierent sets
of keys for the same individual phone book objects. We can do this by creating a second dictionary
with our alternate key.
Lets assume we have a list of tuples that looks like the following.
names= [ ( "last name", "first name", "phone number" ),
( "Howard", "Moe", "555-1111" ),
( "Howard", "Shemp", "555-2222" ),
( "Fine", "Larry", "555-3333" ), ]
We might want to turn this into two dictionaries doing something like the following. This will decompose the list into the individual name tuples, and assign each tuple to nameTuple. We can associate
this tuple object in two dierent dictionaries. In this example, well assign the tuple to byName and
byPhone.
byName = dict()
byPhone= dict()
for nameTuple in names:
ln, fn, ph = nameTuple
byName[ln]= nameTuple
byPhone[ph]= nameTuple
322
We have four ways that we can use this function using positional parameters: roll(), roll(5), roll(1,8),
roll(4,12). These calls will roll two standard dice, ve standard dice, one eight-sided dice and four 12-sided
dice.
When confronted with additional positional argument values to a function, Python must raise a TypeError
exception. While this makes sense, we can see that there are alternatives. For example, Python could silently
ignore the extra values. This alternative is unacceptable, because it would give us no warning of making
common mistake. Well look at another, acceptable alternative below.
Heres an example of misusing our roll() function. We provided too many values and got a TypeError
exception.
>>> roll(4,8,12)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: roll() takes at most 2 arguments (3 given)
Excess Positional Argument Values. Python gives us a way to dene a function that will collect all the
extra argument values into a tuple instead of raising an exception. This allows a function to work with an
indenite number of argument values, the way max() and min() do.
If you want a collection of positional argument values in a tuple, you provide a parameter of the form *extras.
Your variable, here called extras, will receive a sequence with all of the extra positional arguments. The *
is part of the syntax and tells Python that this parameter gets all of the unexpected positional argument
values.
The myMax Function. The following function accepts all of the positional arguments in a single parameter,
args. This parameter will be a tuple with all of the argument values.
def myMax( *args ):
max= args[0]
for a in args[1:]:
if a > max: max= a
return max
The *args parameter species that a tuple of all arguments is assigned to the parameter variable args. We
take the rst of these values (args[0]) as our current guess at the maximum value, max. We use a for loop
that will set a to each of the other arguments, computed as a slice of args starting with the second element.
If a is larger than our current guess, we update the current guess, max. At the end of the loop, the postcondition is that we have visited every element in the list args; the current guess must be the largest value.
We can use myMax() the same way we use the built-in max().
>>> myMax(4,8,12)
12
Bonus Questions. What happens when we ask for myMax()? Is this sensible? What does the built-in
max() do?
323
def add3( a, b, c ):
return a + b + c
Python lets us prex a sequence object with * to indicate that this object should supply the rest of the
positional parameters with values.
We can evaluate this add3() function using the * notation to assign parameters from a sequence.
>>> add3( 9, 3, 2 )
14
>>> some_sequence = [ 9, 3, 2 ]
>>> add3( *some_sequence )
14
Of course, we can combine the last two lines into something like this.
>>> add3( *[ 9, 3, 2 ] )
14
A printf Function. Heres another example of a function with one xed parameter and an unlimited
number of additional positionl parameters.
We must have the xed parameter rst. It must be followed by all the extra arguments.
def printf( format_string, *vals ):
print( format_string.format( *vals ) )
Here, weve dened the rst parameter as to be format_string. Any other argument values will be collected
into a parameter called vals.
Notice that this is another example of the head-tail pattern that we noted when talking about iterators.
In this case, we have one positional parameter at the head and the remaining positional parameters are the
tail.
We provided the vals sequence of argument values to the str.format() method function.
>>> printf( "spin: {0:d} {1}", 1, "red" )
spin: 1 red
>>> printf( "{0:d} rolls, ending roll: {1:d} {2:d}", rolls, dice[0], dice[1] )
23 rolls, ending roll: 3 4
324
We can evaluate this function a number of ways. Two optional parameters gives us four combinations of
forms. We can use keywords for each argument, also, giving us a total of 14 dierent forms for using this
function. We wont enumerate them; well only show a few.
The rst example uses positional arguments and default values.
The next examples uses a mixture of positional and keyword arguments.
The nal example shows all keyword arguments.
>>> diceRolls(5)
[[5, 2], [1, 4], [6, 6], [4, 5], [6, 5]]
>>> diceRolls(5,sides=8)
[[3, 7], [3, 4], [4, 3], [6, 6], [1, 7]]
>>> diceRolls(dice=5,rolls=3)
[[1, 5, 6, 5, 5], [1, 1, 4, 3, 2], [3, 2, 6, 3, 6]]
Heres what happens if we try to break the rules. The rst example shows what happens when we dont
provide a value for a required parameter. The last example show what Python does with an unexpected
keyword.
>>> diceRolls(dice=2,sides=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: diceRolls() takes at least 1 non-keyword argument (0 given)
>>> diceRolls(3,dice=5,label="yacht game")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: diceRolls() got an unexpected keyword argument 'label'
When confronted with additional keyword arguments, Python must raise a TypeError exception. While this
makes sense, we can see that there are alternatives. For example, Python could silently ignore the extra
keywords. This is unacceptable, because it would give us no warning of making common mistake. Well look
at another alternative below.
Excess Keyword Argument Values. Python gives us a way to dene a function that will collect all the
extra keyword argument values into a dictionary instead of raising an exception. This allows a function
to work with an indenite number of argument values.
If you want the extra keyword arguments collected into a dictionary, you provide a parameter of the form
**extras. Your variable, here called extras, will receive a dictionary with all of the extra keywords and
their argument values. The ** is part of the syntax and tells Python that this parameter gets all of the
unexpected keyword arguments.
Rate Time Distance. The following function accepts an arbitrary number of keyword arguments in
a single parameter, named args.
def rtd( **args ):
if "rate" in args and "time" in args:
args["distance"]= args["rate"]*args["time"]
if "rate" in args and "distance" in args:
args["time"]= args["distance"]/args["rate"]
if "time" in args and "distance" in args:
args["rate"]= args["distance"]/args["time"]
return args
We dened this function to accept an arbitrary number of keyword arguments. These are collected into a
dictionary, named args. We identify the combination of rate, time and distance by checking for a given
325
key in the dictionary. For each combination, we can solve for the remaining value and update the dictionary
by insert the additional key and value into the dictionary.
This function returns a small dictionary with the missing value computed from the other two values. If
for some reason it cannot compute a new value from the input keyword arguments, it returns the original
arguments dictionary. Another possibility for this situation is to raise an exception indicating that the
problem does not compute.
Heres two examples of using this rtd() function.
>>> print(rtd(rate=60, time=45/60))
{'distance': 45.0, 'rate': 60.0, 'time': 0.75}
>>> print(rtd(distance=173, time=2+50/60))
{'distance': 173, 'rate': 61.058823529411761, 'time': 2.8333333333333335}
The rst one computes the distance when traveling 60 MPH for 45 minutes.
The second shows the average speed when going 173 miles in 2 hours and 50 minutes.
Python lets us prex a dictionary object with ** to indicate that this object should supply the rest of the
positional parameters with values.
We can evaluate this almost() function using the ** notation to assign parameters from a sequence.
>>> almost( 355/113, math.pi, 0.0000001 )
True
>>> keywords = {'a':355/113, 'b':math.pi, 'eps':0.0000001}
>>> almost( **keywords )
True
This exibility allows us to create the arguments for a function in a variety of ways.
326
What not to do. Heres an example of a poorly-done function denition. Unwisely, it uses a mutable
object as a default value.
import random
def createRolls( aList=[] ):
for i in range(1000):
roll = random.randint(1,6), random.randint(1,6)
aList.append(roll)
return aList
Weve dened function that ca be used two ways. Heres one use case: were providing a list that we want
to have updated with a sequence of dice rolls.
>>> myRolls= []
>>> c= createRolls(myRolls)
>>> len(myRolls)
1000
>>> len(c)
1000
>>> c is myRolls
True
The above looks about right. The createRolls() created 1000 dice rolls and appended them to the given
list, myRolls.
Heres what happens when we use the mutable default value.
>>> a= createRolls()
>>> len(a)
1000
>>> b= createRolls()
>>> len(b)
2000
>>> len(a)
2000
>>> a is b
True
What happened?
The rst time we evaluated createRolls(), we used a default value for aList. This list was updated with
1000 dice rolls.
The second time, we also used a default value. Since it was the same default value, we updated the same
list object with another 1000 dice rolls.
It turns out that theres only one mutable list object thats part of the denition of createRolls().
Mutable Default Values. If you must have a mutable object as a default value, youll need to do something
like this.
import random
def createRolls( aList=None ):
if aList is None: aList= []
for i in range(1000):
roll = random.randint(1,6), random.randint(1,6)
aList.append(roll)
return aList
The if aList is None: aList= [] will create a fresh, new empty list when no argument value is provided.
Each time the function is evaluated, it wont be reusing the single default value.
327
328
The Problem. Raising a KeyError exception isnt always the most desirable response. Lets look at
something like the following.
import random
def even_odd( spin ):
if spin == 0 or 38:
even_odd = "0"
elif spin % 2 == 0:
even_odd = "even"
else
even_odd = "odd"
freq = { }
for i in range(1000):
329
spin = randomc.randrange(0,38)
result = even_odd( spin )
freq[result] += 1
Were trying to accumulate a simple frequency distribution of even vs. odd vs. zero spins on a roulette
wheel. While this for loop is short and sweet, it cant work.
Run it and see what happens.
When we get to freq[result] += 1 (line 15), the required key value may not be in the dictionary.
We can change our for loop to this.
freq = { }
for i in range(1000):
spin = randomc.randrange(0,38)
result = even_odd( spin )
if result not in freq:
freq[result] = 0
freq[result] += 1
This is unappetizing because were executing the if result not in freq statement each time through the
loop even though its needed rarely.
A faster alternative is the following.
freq = { }
for i in range(1000):
spin = randomc.randrange(0,38)
result = even_odd( spin )
try:
freq[result] += 1
except KeyError:
freq[result] = 1
This is unappetizing: the try block is kind of big and ugly for a pretty standard problem.
Solution. Theres something simpler, however, the defaultdict in the collections module.
The collections module gives us a very cool variation on the dict theme. The defaultdict doesnt raise
KeyError; instead, it creates a new entry in the dictionary for us.
330
freq This is the usual way to dene a frequency table. We can then use freq[someKey] += 1
without a second thought.
If someKey is not in the dictionary, the defaultdict will call the supplied factory function,
int(), which returns a zero; which is added to the dictionary. Then it can add one to the
value associated with the key.
index This is the usual way to dene an index. An index has a key value and a list of values
associated with that key. We can then use index[someKey].append( anotherValue ) to
put items into our index.
If someKey is not in the dictionary, the defaultdict will call the supplied factory function,
list() which returns an empty list; which is added to the dictionary. Then it can append
anotherValue to the list in the dictionary.
labels This is the usual way to have a dictionary of string labels. We can say labels[someKey]
and get a string value, either the proper associated label or a special "N/A" string.
The lambda: is a way to dene an anonymous function. Why do we need this? The
defaultdict cant work with a simple literal, it must have a factory function. If we want
a simple literal, we can wrap it in a lambda expression.
Lambda Expressions
Lambda expressions can be confusing. Whenever you see a lambda you can always rewrite it as
something like this.
def return_na(): return "N/A"
labels= collections.defaultdict( return_na )
331
9
10
11
12
13
14
15
16
332
If we sort the values into order, we dont know which key ((1, 1) or (6, 6)) the value 1 is associated
with.
We can handle this, however, by creating an inverted index for the dictionary.
Part 1. First, create a dictionary of dice rolls. The key will be a dice tuple, the value will be a integer.
Create a defaultdict using int() as the factory for default values. Use a for loop to generate 1000
random rolls of two dice. Increment the frequency counts in your dictionary.
The result should be a dictionary with no more than 36 dice combinations and their frequencies.
Part 2. Invert this dictionary. The keys will be frequency. The value will be a list of dice rolls with
that given frequency.
Create a defaultdict using list() as the factory for default values. Iterate through the frequency
dictionary getting the roll and the count. Build the new dictionary by using the count as a key and
appending the roll to the list.
Part 3. Print the new dictionary sorted in ascending order by the key. This key for the new dictionary
is a count. The value will be a list of rolls that occured with the given frequency.
333
334
CHAPTER
TWELVE
Files are one of the most important features of our operating system. For background, see Hardware Terminology for various kinds of hardware on which les reside.
In Software Terminology there is some background on the operating system structure and protocol that
denes the les or documents we work with. The operating system insulates us from the complexities of
various devices and provides us some handy abstractions that make it much easier to save, nd and manage
our documents.
Up until now, weve used very few les in just three ways. Weve used a le named python (Windows
python.exe) heavily. This le contains the binary program that is Python; weve run this program by
typing a command in a terminal window or double-clicking an icon. Weve also used a le named idle.py;
this is a Python script that contains the IDLE program. Finally, weve also saved our various scripts in les
and asked Python to run those les.
One of the most important things that our programs can do is read or write les. Files are a mixture of
two unrelated concepts: they are a collection of data items, and they involve our OS notion of a le system,
le names, directories, and devices. Well introduce les in External Data and Files. Well add the OS
processing in Files, Contexts and Patterns of Processing. Well wrap up with an overview of all the things
les are used for in File-Related Library Modules.
335
336
The name of the le. The operating system will interpret this name using its working directory rules.
If the name starts with / (or device:\) its an absolute name. Otherwise, its a relative name; the
current working directory plus this name identies the le.
Python can translate standard paths (using /) to Windows-specic paths. This saves us from having
to really understand the dierences. We can name all of our les using /, and avoid the messy details.
We can, if we want, use raw strings to specify Windows path names using the \ character.
The access mode for the le. This is some combination of read, write and append. The mode can also
include instructions for interpreting the bytes as characters.
Optionally, we can include the buering for the le. Generally, we omit this. If the buering argument
is given, 0 means each byte is transferred as it is read or written. A value of 1 means the data is buered
a line at a time, suitable for reading from a console, or writing to an error log. Larger numbers specify
the buer size: numbers over 4,096 may speed up your program.
Once we create the le object, we can do operations to read characters from the le or write characters to
the le. We can read individual characters or whole lines. Similarly, we can write individual characters or
whole lines.
When Python reads a le as a sequence of lines, each line will become a separate string. The '\n' character
is preserved at the end of the string. This extra character can be removed from the string using the rstrip()
method function.
A le object (like a sequence) can create an iterator which will yield the individual lines of the le. You can,
consequently, use the le object in a for statement. This makes reading text les very simple.
When the work is nished, we also need to use the les close() method. This empties the in-memory
buers and releases the connection with the operating system le. In the case of a socket connection, this
will release all of the resources used to assure that data travels through the Internet successfully.
337
338
dataSource This example opens the existing le name_addr.csv in the current working directory for reading. The variable dataSource identies this le object, and we can use this
variable for reading strings from this le.
This le is opened in binary mode.
newPage This example creates a new le addressbook.html (or it will truncate this le if it
exists). The le will be in the current working directory. The variable newPage identies
the le object. We can then use this variable to write strings to the le.
theErrors This example appends to the le error.log (or creates a new le, if the le doesnt
exist). The le has the directory path /usr/local/log/. Since this is an absolute name, it
doesnt depend on the current working directory.
Buering les is typically left as a default, specifying nothing. However, for some situations, adjusting the
buering can improve performance. Error logs, for instance, are often unbuered, so the data is available
immediately. Large input les may be opened with large buer numbers to encourage the operating system
to optimize input operations by reading a few large chunks of data from the device instead of a large number
of smaller chunks.
Tip: Debugging Files
There are a number of things that can go wrong in attempting to create a le object.
If the le name is invalid, you will get operating system errors. Usually they will look like this:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'wakawaka'
It is very important to get the les path completely correct. Youll notice that each time you start IDLE,
it thinks the current working directory is something like C:\Python26. Youre probably doing your work in
a dierent default directory.
When you open a module le in IDLE, youll notice that IDLE changes the current working directory is the
directory that contains your module. If you have your .py les and your data les all in one directory, youll
nd that things work out well.
The next most common error is to have the wrong permissions. This usually means trying to writing to a
le you dont own, or attempting to create a le in a directory where you dont have write permission. If
you are using a server, or a computer owned by a corporation, this may require some work with your system
administrators to sort out what you want to do and how you can accomplish it without compromising
security.
The [Errno 2] note in the error message is a reference to the internal operating system error numbers.
There are over 100 of these error numbers, all collected into the module named errno. There are a lot of
dierent things that can go wrong, many of which are very, very obscure situations.
339
The Python file object has a number of operations that transform the file object, read from or write to
the OS le, or access information about the file object.
Reading. The following read methods get data from the OS le. These operations may also change the
Python file objects internal status and buers. For example, at end-of-le, the internal status of the file
object will be changed. Most importantly, these methods have the very visible eect of consuming data from
the OS le.
file.read(size) string
Read as many as size characters from le f as a single, large string. If size is negative or omitted, the
rest of the le is read into a single string.
from __future__ import print_function
dataSource= open( "name_addr.csv", "r" )
theData= dataSource.read()
for n in theData.splitlines():
print(n)
dataSource.close()
file.readline(size) string
Read the next line or as many as size characters from le f ; an incomplete line can be read. If size is
negative or omitted, the next complete line is read. If a complete line is read, it includes the trailing
newline character. If the le is at the end, f. readline() returns a zero length string. If the le has a
blank line, this will be a string of length 1, just the newline character.
from __future__ import print_function
dataSource= file( "name_addr.csv", "r" )
n= dataSource.readline()
while len(n) > 0:
print(n.rstrip())
n= dataSource.readline()
dataSource.close()
file.readlines(hint)
Read the next lines or as many lines from the next hint characters from le f. The hint size may be
rounded up to match an internal buer size. If hint is negative or omitted, the rest of the le is read.
All lines will include the trailing newline character. If the le is at the end, f. readlines() returns a
zero length list.
When we simply reference a file object in a for statement, this is the function thats used for iteration
over the le.
dataSource= file( "name_addr.csv", "r" )
for n in dataSource:
print(n.rstrip())
dataSource.close()
Writing. The following methods send data to the OS le. These operations may also change the Python
file objects internal status and buers. Most importantly, these methods have the very visible eect of
producing data to the OS le.
file.flush()
Flush all accumulated data from the internal buers of le f to the device or interface. If a le is
buered, this can help to force writing of a buer that is less than completely full. This is appropriate
for log les, prompts written to sys.stdout and error messages.
file.truncate(size)
Truncate le f. If size is not given, the le is truncated at the current position. If size is given, the le
will be truncated at or before size. This function is not available on all platforms.
340
file.write(string)
Write the given string to le f. Buering may mean that the string does not appear on a console until
a close() or flush() operation is used.
newPage= file( "addressbook.html", "w" )
newPage.write( "<html>\n<head><title>Hello World</title></head>\n<body>\n" )
newPage.write( "<p>Hello World</p>\n" )
newPage.write( "<\body>\n</html>\n" )
newPage.close()
file.writelines(list)
Write the list of strings to le f. Buering may mean that the strings do not appear on any console
until a close() or flush() operation is used.
newPage= file( "addressbook.html", "w" )
newPage.writelines( [ "<html>\n", "<head><title>Hello World</title></head>\n", "<body>\n" ] )
newPage.writelines( ["<p>Hello World</p>\n" ] )
newPage.writelines( [ "<\body>\n", "</html>\n" ] )
newPage.close()
Accessors. The following le accessors provide information about the file object.
file.tell() integer
Return the position from which le f will be processed. This is a partner to the seek() method; any
position returned by the tell() method can be used as an argument to the seek() method to restore
the le to that position.
file.fileno() integer
Return the internal le descriptor (fd) number used by the OS library when working with le f. A
number of modules provide access to these low-level libraries for advanced operations on devices and
les.
file.isatty() boolean
Return True if le f is connected to an OS le that is a console or keyboard.
file.closed() boolean
This attribute of le f is True if the le is closed.
file.mode() string
This attribute is the mode argument to the file() function that was used to create the le object.
file.name
This attribute of le f is the lename argument to the file() function that was used to create the le
object.
Transfomers. The following le transforms change the file object itself. This includes closing it (and
releasing all OS resources) or change the position at which reading or writing happens.
file.close()
Close le f. The closed ag is set. Any further operations (except a redundant close) raise an IOError
exception.
file.seek(oset [, whence ])
Change the position from which le f will be processed. There are three values for whence which
determine the direction of the move.
If whence is 0 (the default), move to the absolute position given by oset. f.seek(0) will rewind le
f.
If whence is 1, move relative to the current position by oset bytes. If oset is negative, move backwards;
otherwise move forward.
12.1. External Data and Files
341
If whence is 2, move relative to the end of le. f.seek(0,2) will advance le f to the end.
Heres a quick example that shows one way to read this le using the les iterator. This isnt the best way,
that will have to wait for The csv Module.
1
2
3
4
1. We create a Python file object for the name_addr.csv in the current working directory in read mode.
We call this object dataSource.
2. The for statement creates an iterator for this le; the iterator will yield each individual line from the
le.
3. We can print each line.
4. We close the le when were done. This releases any operating system resources that our program tied
up while it was running.
A More Complete Reader. Heres a program that reads this le and reformats the individual records.
It prints the results to standard output. This approach to reading CSV les isnt very good. In the next
chapter, well look at the csv module that handles some of the additional details required for a really reliable
program.
nameaddr.py
1
2
3
4
5
#!/usr/bin/env python
"""Read the name_addr.csv file."""
dataSource = file( "name_addr.csv", "r" )
for addr in dataSource:
# split the string on the ,'s
342
6
7
8
9
10
quotes= addr.split(",")
# strip the '"'s from each field
fields= [ f.strip('"') for f in quotes ]
print( fields[0], fields[1], fields[2], fields[4] )
dataSource.close()
3. We open the le name_addr.csv in our current working directory. The variable dataSource is our
Python le object.
4. The for statement gets an iterator from the le. It can then use the iterator, which yields the individual
lines of the le. Each line is a long string. The elds are surrounded by "s and are separated by ,s.
7. We use the split() function to break the string up using the ,s. This particular process wont work
if there are ,s inside the quoted elds. Well look at the csv module to see how to do this better.
9. We use the strip() function to remove the "s from each eld. Notice that we used a list comprehension
to map from a list of elds wrapped in "s to a list of elds that are not wrapped in "s.
Seeing Output with print. The print() function does two things. When we introduced print() back
in Seeing Results : The print Statement, we hustled past both of these things because they were really quite
advanced concepts.
We covered strings in Sequences of Characters : str and Unicode. Were covering les in this chapter. Now
we can open up the hood and look closely at the print() function.
1. The print() function evaluates all of its expressions and converts them to strings. In eect, it calls
the str() built-in function for each argument value.
2. The print() function writes these strings, separated by a separator character, sep. The default
separator is a space, ' '.
3. The print() function also writes an end character, end. The default end is the newline character,
'\n'.
The print() function has one more feature which can be very helpful to us. We can provide a le parameter
to redirect the output to a particular le.
We can use this to write lines to sys.stderr.
1
2
3
4
5
>> file [ ,
expression , ... ]
343
The >> is an essential part of this peculiar syntax. This is an odd special case punctuation that doesnt
appear elsewhere in the Python language. Its called the chevron print.
Important: Python 3
This chevron print syntax will go away in Python 3. Instead of a print statement with a bunch of special
cases, well use the print() function.
Opening A File and Printing. This example shows how we open a le in the local directory and write
data to that le. In this example, well create an HTML le named addressbook.html. Well write some
content to this le. We can then open this le with FireFox or Internet Explorer and see the resulting web
page.
addrpage.py
1
2
3
4
5
6
7
8
9
10
11
#!/usr/bin/env python
"""Write the addressbook.html page."""
from __future__ import print_function
new_page = open( "addressbook.html", "w" )
print('<html>', new_page)
print(' <head>'
'<meta http-equiv="content-type" content="text/html; charset=us-ascii">'
'<title>addressbook</title></head>', file=new_page)
print(' <body><p>Hello world</p></body>', file=new_page )
print('</html>', file=new_page)
new_page.close()
344
Our two examples, addrpage.py and name_addr.py are really two halves of a single program. One
program reads the names and address, the other program writes an HTML le. We can combine these
two programs to reformat a CSV source le into a resulting HTML page.
The name and addresses could be formatted in a web page that looks like the following:
<html>
<head><title>Address Book</title></head>
<body>
<table>
<tr><td>last name</td><td>first name</td><td>email address</td></tr>
<tr><td>last name</td><td>first name</td><td>email address</td></tr>
<tr><td>last name</td><td>first name</td><td>email address</td></tr>
...
</table>
</body>
</html>
Each of our input elds becomes an output eld sandwiched in between <td> and </td>. In this case,
we uses phrases like last name, rst name and email address to show where real data would be inserted.
The other HTML elements like <table> have to be printed as theyre shown in this example.
Your nal program should open two les: name_addr.csv and addressbook.html. Your program
should write the initial HTML material (up to the rst <tr>) to the output le. It should then read
the CSV records, writing a complete address line between <tr> to </tr>. After it nishes reading and
writing names and addresses, it has to write the last of the HTML le, from </table> to </html>.
345
346
Notice that we didnt need to do web_page.close(). That close was handled for us by the with statement.
The csv module gives us a handy denition called a reader which will extract individual records from the
le, properly match up the "s, and correctly split elds on the ,s.
The csv.reader() function is an iterator object that both gets individual lines from the le and does all of
the necessary decoding for us. We can use this CSV iterator with the for statement to correctly parse every
line from the le.
To use the csv module, we must use import csv. This introduces the modules functions. Were interested
in csv.reader() for this rst example.
Some spreadsheet software writes unusual line-ending sequences to CSV les. In order to handle the unusual
line-ending characters, we have to open the le with a mode of "rb".
from __future__ import print_function
import csv
with open( "name_addr.csv", "rb" ) as naFile:
rdr= csv.reader( naFile )
for person in rdr:
print( person[0], person[2], person[4] )
When you run this program, youll notice that the header line in the le is being processed as if it were data.
Wed like to skip past this gracefully. Since rdr is an iterator, we can use rdr.next() to get the rst line
from the le.
from __future__ import print_function
import csv
with open( "name_addr.csv", "rb" ) as naFile:
rdr= csv.reader( naFile )
header= rdr.next()
for person in rdr:
print( person[0], person[2], person[4] )
347
This version quietly saves the header in header, and processes the data rows separately.
The stock, date and time are quoted strings. The other elds are generally numbers, typically in dollars
or percents with two digits of precision. There are a few exceptions to this format for indexes and mutual
funds.
This is a very old example of the le. The prices of these stocks may have changed, but the le format hasnt
changed one bit.
The rst line shows a quote for an index: the Dow-Jones Industrial average. The trading volume doesnt
apply to an index, so it is N/A, without quotes. The second line shows a regular stock (Apple Computer)
that traded 8,122,800 shares on June 15, 2001. The third line shows a mutual fund. The detailed opening
price, days high, days low and volume are not reported for mutual funds.
After looking at the results on line, we clicked on the link to save the results as a CSV le. We called it
quotes.csv. The following program will open and read the quotes.csv le after we download it from this
service.
stockquote.py
1
2
3
4
5
6
7
8
#!/usr/bin/env python
from __future__ import print_function
import csv
with open( "quotes.csv", "r" ) as quote_file:
qRdr= csv.reader( quote_file )
for quote in qRdr:
stock, price, dt, tm, chg, opn, dHi, dLo, vol = quote
print(stock, price, dt, tm, chg, vol)
4. We open our quotes le for reading, creating an object named qFile. This le object in our Python
program will read from the quotes.csv le on our disk.
5. By using the csv.reader function, we create an iterator which will parse each line of the CSV le,
returning a list of data values with the quotes and commas removed.
6. We use a for statement to iterate through the sequence of lines in the le. Each line of the le is a list
of values that comprise a single stock quote, quote.
7. We use multiple assignment to assign each eld of the quote to a relevant variable.
When nished processing the le, its role as context manager in the with statement will assure that its
closed. This will release any resources like le descriptors or buers that were associated with this le.
348
We can easily sort data in a list, using the sort() method function. So, our solution must rst read the data,
creating a list. We can sort the list, then write the list in sorted order for processing by another program.
In this case, well sort our stock quotes by company, the rst eld in each quote record. For simplicity well
write the sorted CSV le to sys.stdout. Well look at some extensions to this program to sort by dierent
elds and write to a dierent output le.
stocksort.py
1
2
3
4
5
6
#!/usr/bin/env python
from __future__ import print_function
import csv
with open( "quotes.csv", "r" ) as quote_file:
qRdr= csv.reader( file )
data= [ tuple(quote) for quote in qRdr ]
7
8
9
def name(quote):
return quote[0]
10
11
12
13
data.sort( key=name )
for q in data:
print( q[0], q[1], q[2], q[3] )
4. We create le object referencing our quotes.csv le. We use csv.reader() to create an iterator
which will parse each line of the CSV le, returning a list of data values with the quotes and commas
removed.
6. We use a list comprehension to create a list-of-tuples data structure from the contents of the le.
This comprehension creates a list as follows.
Iterate over each quote of the le, setting variable quote to each line produced by the CSV reader.
This line will be a list of values with nine elements, representing the stock, price, date, time,
change, opening Price, daily high, daily low and volume traded.
We transform each individual quote from a 9-item list into a 9-tuple.
8. We dene a key function named name(). This function returns the key for sorting. In this case, the
key is item zero of each quote, which is the name of the stock.
11. We sort the data sequence. We use our function denition to nd the key for each quote. This kind of
sort is covered in depth Sorting a List: Expanding on the Rules.
12. Once the sequence of data elements is sorted, we can then write the company, price, date and time in
company name order.
Tip: Debugging CSV Input
One problem with le processing is that our Python data structure isnt a giant string of characters. However,
the le is simply a giant string. Essentially, reading a le is a way of translating the characters into a useful
Python structure.
The most common thing that can go wrong is not creating the expected structure in our Python program.
In the Reading and Sorting example, we might not create our list of tuples correctly.
It is helpful to print the value of the data variable to get a good look at the data structure which is produced.
Here we show the beginning of our list of tuples. Weve adjusted the Python output to make it a little
more readable.
349
Looking at the intermediate results helps us be sure that we are reading the le properly.
A more interesting modication is to add various function denitions for dierent sorts. For instance, if we
wanted to sort by price (eld 1), we could make the following change. We can dene any number of functions
and use one of them in the sort() method function.
def name(quote):
return quote[0]
def price(quote):
return quote[1]
data.sort( key=price )
Bonus Question. Why did we add the calls to the built-in function float()? What happens if we take
those function calls out? What is the dierence between comparing strings of digits and comparing numeric
values? For review, see Sorting a List: Expanding on the Rules.
This le contains a header line that names the data columns, making processing much more reliable. If the
web site adds a eld or changes the order of the elds, we can use this column title information to assure
that our program doesnt need to be changed.
We can use the column titles to create a dictionary for each line of data. By making a dictionary of each
line, we can identify each piece of data by the column name, not by the position. Identifying data by column
name is generally more clear. Its also immune the column order.
This le has two lines of junk that we want to gracefully ignore. First, it has a trailing USD line, which
shows the cash position of the portfolio. Second, it has a Totals: line which doesnt seem to have anything.
Well need to discard these two lines.
portfolio.py
1
2
3
4
5
#!/usr/bin/env python
from __future__ import print_function, division
import csv
with open( "dwnld_portinfo-3.csv", "r" ) as posn_file:
pDictRdr= csv.DictReader( posn_file )
350
6
7
8
9
10
11
12
13
14
15
16
invest= 0
current= 0
for posn in pDictRdr:
if posn[""] == "Totals:":
continue
if posn["TICKER"] == "USD":
continue
print(posn)
invest += float(posn["PURCHASE PRICE"])*float(posn["# SHARES"])
current += float(posn["PRICE"])*float(posn["# SHARES"])
print(invest, current, (current-invest)/invest)
4. We open our portfolio position le for reading, creating an object named posn_le.
5. We use our input le, posn_le to create a csv.DictReader. This reader will do three things: it will
match up " characters, split elds on , characters, and use the rst line of the le as keys to create a
dictionary.
Each row will be a dictionary. The key will be the column header, and the value will be this rows
data value.
6. We also initialize two counters, invest and current to zero. These will accumulate our initial investment
and the current value of this portfolio.
8. We use a for statement to iterate through the positions in the le. Each position will be a dictionary,
assigned to the variable posn.
We can get each elds value using the column title. For example, we get the ticker symbol using
posn["TICKER"].
9. Our rst piece of processing is a lter. The totals line has the value Totals: in the unnamed column.
Well ignore the totals line at the end (posn[""] == "Totals:") by continuing the loop. The cash
position has a ticker symbol of USD. Well ignore the cash position (posn["TICKER"] == "USD") by
continuing the loop.
16. Our second piece of processing is some simple calculations. In this case, we convert the purchase price
to a number, convert the number of shares to a number and multiply to determine how much we spent
on this stock. We accumulate the sum of these products into invest.
We also convert the current price to a number and multiply this by the number of shares to get the
current value of this stock. We accumulate the sum of these products into current.
18. When the loop has terminated, we can close the le, write out the two numbers, and compute the
percent change.
The conversion to a dictionary makes our business rules relatively easy to read.
If we wanted to be really precise, we could say things like the following to separate the issue of identifying
the cash position line from the processing for the cash position line. The boolean variable cashPosition is
set to True when we identify the cash position line in the le.
cashPosition= data["TICKER"] == "USD"
if cashPosition:
continue
Additionally, we could make the processing more clear by expanding it into the following. We would separate
the conversion from string to number from the calculation using that number.
shares= float(data["# SHARES"])
purch= float(data["PURCHASE PRICE"])
invest += shares * purch
351
This is an important t and nish issue for GNU/Linux programs. A well-behaved program can
use sys to get argument values so that an names of les or directories are not hard-coded into the
program. Additionally we should always use sys.stdout and sys.stdin to make it easy to reuse programs.
352
From someones head to a le. Our programs job is some kind of knowledge capture. Even if
were writing a program to help artists paint or musicians compose, were capturing knowledge (or
ideas or art or relationships) that started in someones head. Well then be encoding the knowledge
(or idea or artwork) and saving it on a device attached to a computer. In short, well be creating les.
From a le to someones head. Our program will be reading and processing data that reside
computer. If were reading a web page, looking at a stock portfolio or reviewing results of a simulation,
data starts in computer les and we read them. If were playing a game, were reading the game
information and player actions from les and displaying the state of the game.
From le to le. Our program will be reading and writing data On the computer. For example, if
were applying an audio lter to an MP3 le, were starting with a le, processing that data in that
le, and creating a new le.
All the processing we want to do will involve les in one way or another. Files are either input or output or
both. Well focus on disk les because theyre the easiest to work with as a beginner.
Well talk about how data is organized on les in File Organization and Structure.
There are a number of library modules that are relevant to le processing.
File and Directory Access
Generic OS Services
Data Persistentence
Data Compression and Archiving
Internet Data Handling
Python Runtime Services
We can then look at some common variations on le processing in Files are the Plumbing of a Software
Architecture.
353
A le of characters could be understood as a Python program. Usually, we emphasize this by making sure
the le name ends in .py.
A .csv le is a sequence of lines. Each line has one or more elds, wrapped in "", and separated by ,s.
Hiding the Details. We use the notion of layers to understand the structure of les. At the foundation
layer (bytes) to ASCII or Unicode characters to yet higher layers built on this foundation. On top of the
character foundation, our choices fan out in many, many directions. Well stick to the most common le
types: HTML, XML, Python, .CSV.
This idea of layers of meaning from bytes to characters to lines to meaningful records is yet another
application of the abstraction principle. Python allows us to imagine that a le consists of lines of characters.
It provides us an abstraction that conceals the details of all those bytes and how they are encoded. We can
do the same thing in our software by writing a function that reads a line and transforms it into a meaningful
tuple or object that we can process.
Non-Character Files. There are many common le formats which do not have obvious character encodings.
Image les contain encodings of the picture elements, pixels. A photo of your family on vacation in Stockholm
might be a 4.6 megapixel image. This image has 2560 1920 individual dots, each of which can be any of
16 million dierent colors. A raw image le could be 14 million individual bytes of data.
When I look at my computer, I see that the .jpeg les are much smaller. It turns out that these les are
compressed. Some clever experts dened ways to reduce the number of bytes required to capture most of
the image with enough accuracy that some of us barely notice the dierence between a JPEG image and a
RAW image.
An audio le might have 48,000 samples per second, spread over three minutes leads to 8.2 million individual
samples. Each sample could be one of 4000 amplitude levels, leading us to 12.4 million individual bytes of
data. An MP3 le uses a clever algorithm to compress all of these bytes down to 3.5 million bytes that sound
pretty much the same as the original AIFF audio le.
Database Files. A database is one or more very highly organized les. A database may contain text,
audio and images. That means that the content of a database may contain bytes that must be interpreted
as characters, bytes that can be interpreted as MP3-encoded audio, and bytes that can be understood as
JPEG-encoded images.
The top-most layer is the meaning of all that data and all those bytes. In this case, the database may be
a summary of decades of custom quilt-making, with pictures, stories, and descriptions of dozens of quilts.
Tip: Debugging File Formats
When we talk about how data appears in les, we are talking about data representation. This is a dicult
and sometimes subtle design decision. A common question is How do I know what the data is? . There
are two important points of view.
The program you are designing will save data in a le for processing later. Since you are designing
the le, you get to choose the representation. You can pick something that is easy for your Python
program to write. Or, you can look at other programs and pick something that is easy for the other
programs to read. This can be a dicult balancing act.
The program you are designing must read data prepared by another program. Since someone else
designed the le, you will interpret the data they provide. If their format is something that Python
can easily interpret, your program will be very simple. However, the more common situation is that
their format is not something Python can interpret, and you must write this interpretation yourself.
354
os.path.dirname(path) directory
Return the directory name, the rst half of the result created by os.path.split( path )
>>> import os
>>> fn='/Users/slott/Documents/Writing/NonProg2.5/notes/portfolio.py'
>>> os.path.dirname(fn)
'/Users/slott/Documents/Writing/NonProg2.5/notes'
os.path.exist(path) boolean
Return True if the pathname refers to an existing le or directory.
os.path.getatime(path) time
Return the last access time of a le, reported by os.stat(). See the time module for functions
to process the time value.
>>> import os
>>> import time
>>> fn='/Users/slott/Documents/Writing/NonProg2.5/notes/portfolio.py'
>>> os.path.getatime( fn )
1246637163.0
>>> time.ctime(_)
'Fri Jul 3 12:06:03 2009'
os.path.getmtime(path) time
Return the last modication time of a le, reported by os.stat(). See the time module for
functions to process the time value.
os.path.getsize(path) long integer
Return the size of a le, in bytes, reported by os.stat().
355
>>> import os
>>> fn='/Users/slott/Documents/Writing/NonProg2.5/notes/portfolio.py'
>>> os.path.getsize( fn )
175L
os.path.isdir(path) boolean
Return True if the pathname refers to an existing directory.
os.path.isfile(path) boolean
Return True if the pathname refers to an existing regular le.
os.path.join(string[, ... ]) string
Join path components using the appropriate path separator. This is the best way to assemble
long path names from component pieces. It is operating-system independent, and understands all
of the operating systems punctuation rules.
>>> import os
>>> os.path.join( '/Users', 'slott', 'Documents', 'Writing' )
'/Users/slott/Documents/Writing'
os.path.split(path) tuple
Split a pathname into two parts: the directory and the basename (the lename, without path
separators, in that directory). The result (s, t) is such that os.path.join( s, t ) yields the
original path.
>>> import os
>>> fn='/Users/slott/Documents/Writing/NonProg2.5/notes/portfolio.py'
>>> os.path.split( fn )
('/Users/slott/Documents/Writing/NonProg2.5/notes', 'portfolio.py')
os.path.splitdrive(path) tuple
Split a pathname into a drive specication and the rest of the path. Useful on DOS/Windows/NT.
Useless for Linux or Mac OS.
os.path.splitext(path) tuple
Split a path into root and extension. The extension is everything starting at the last dot in the
last component of the pathname; the root is everything before that. The result tuple ( root , ext
) is such that root + ext yields the original path.
>>> import os
>>> fn='/Users/slott/Documents/Writing/NonProg2.5/notes/portfolio.py'
>>> dir, file = os.path.split(fn)
>>> os.path.splitext( file )
('portfolio', '.py')
356
This program imports the sys and os.path modules. The variable oldFile is set to each le name that
is listed in the sequence sys.argv by the for statement.
Each le name is split into the path name and the base name. The base name is further split to separate
the le name from the extension. The os.path does this correctly for all operating systems, saving us
having to write platform-specic code. For example, splitext() correctly handles the situation where
a Linux le has multiple .s in the le name.
The extension is tested to be .HTML. The processing only applies to these les. A new le name is
joined from the path, base name and a new extension (.BAK). The old and new le names are printed
and some processing, dened in the process(), uses the oldFile and newFile names.
Path Processing
Programmers are faced with a dilemma between writing a simple hack to strip paths or extensions
from le names and using the os.path module.
Some programmers argue that the os.path module is too much overhead for such a simple problem as
removing the .html from a le name.
Other programmers recognize that most hacks are a false economy: in the long run they do not save
time, but rather lead to costly maintenance when the program is expanded or modied.
shutil The shutil module automates copying entire les or directories. This saves the steps of opening,
reading, writing and closing les when there is no actual processing, simply moving les.
When we have complex programs that need to preserve a backup copy of a le or rename a le, we
have two choices for our design.
Use Shell Commands. We can exploit the shell commands of cp or mv (Windows: copy and
rename). To do this, we have to break our processing down into tiny pieces, some of which are
Python programs, and others are shell commands. We can use a shell script (or .BAT le) to
jump back and forth between the Python steps and the shell command steps.
Use the shutil Module. On the other hand, we can use shutil and do everything in Python,
improving performance and simplifying the processing down to a single Python program.
shutil.copy(source, destination)
Copy data and mode bits, basically the GNU/Linux command cp source destination. If destination is a directory, a le with the same base name as source is created. If destination is a full
le name, this is the destination le.
shutil.copyfile(source, destination)
Copy data from source to destination. Both names must be les.
shutil.copytree(source, destination)
Recursively copy the entire directory tree rooted at source to destination. destination must not
already exist. Errors are reported to standard output.
shutil.rmtree(path)
Recursively delete a directory tree rooted at path.
glob The GNU/Linux shell expands wild-cards to complete lists of le names; the verb is to glob (really).
The glob module makes the name globbing capability available to Windows programmers. The glob
module includes the following function that locates all names which match a given pattern.
glob.glob(wildcard) list
Return a list of lenames that match the given wild-card pattern. The fnmatch module is used
for the wild-card pattern matching.
A common use for glob is something like the following.
357
This can make Windows programs process command line arguments somewhat like Unix programs.
Each argument is passed to glob.glob() to expand any patterns into a list of matching les. If the
argument is not a wild-card pattern, glob simply returns a list containing this one le name.
fnmatch The fnmatch module has the essential algorithm for matching a wild-card pattern against le
names. This module implements the Unix shell wild-card rules. These rules are used by glob to locate
all les that match a given pattern. The module contains the following function:
fnmatch.fnmatch(lename, pattern) boolean
Return True if the lename string matches the pattern string.
The patterns use * to match any number of characters, ? to match any single character. [letters]
matches any of these letters, and [!letters] matches any letter that is not in the given set of letters.
>>> import fnmatch
>>> fnmatch.fnmatch('greppy.py','*.py')
True
>>> fnmatch.fnmatch('README','*.py')
False
fileinput The fileinput module helps you read complex collections of text les in a relatively simple
way. This is particularly helpful for creating grep-like processing, where your application reads all of
the les in a large directory tree.
filecmp The filecmp contains a number of functions that help you build le comparison programs. This
is handy for expanding on the basic di program. It is also helpful for moving beyond simple le
comparison into comparing two complete directory structures or comparing sections of complex documents.
358
os.sep
The (or the most common) pathname separator character ( / generally, \ on Windows). Most of
the Python library routines will translate the standard / for use on Windows.
It is better to use the os.path module to construct or parse path names.
os.altsep
The alternate pathname separator (None generally, or / on Windows).
os.pathsep
The component separator used in $PATH (: generally, ; on Windows).
os.linesep
The line separator in text les (the standard newline character, \n, or the Windows variant,
\r\n). This is already part of the readlines() function and the file iterator.
os.defpath
The default search path that the operating system uses to nd an executable le.
os.chdir(path)
Change the current working directory to path.
import os
os.chdir( "/Volumes/Slott02/Writing/Tech/PFNP/Notes" )
os.getcwd() string
Return the current working directory path.
import os
print(os.getcwd())
os.remove(lename)
Delete ( remove, unlink or erase) the le.
os.unlink(lename)
Delete ( remove, unlink or erase) the le.
359
360
Available via
361
Graphic User Interface Applications. GUI applications include IDLE, your favorite word processor,
spread sheet and web browser. Most of what we use computers for are the GUI applications. In a few cases,
the GUI application is a wrapper or veneer that surrounds and underlying command-line application.
There are a few central ttings to making a useful GUI application. An excellent example is IDLE.
The input comes from les as well as the human user. The human users input is handled by a
sophisticated graphics library, like pyGTK or Tkinter. This library unies mouse and keyboard events,
and shares these devices politely with all other applications.
The results go to les as well as the human user. Display to the user is handled by a graphics library.
This library supports the broad variety of display devices, and shares this device politely with all the
other applications.
The applications behavior is controlled through interactive point-and-click. This is called an eventdriven interface. The users commands are events to which the application responds.
You will often create a file object, given the name of a disk le. After all, thats usually the point of using
an application. The Python programming will read or write characters using that le object.
Since youre using the graphics library to interact with the mouse, keyboard and display devices, you wont
use les for these user interaction devices directly.
Web Applications. You use a web application when you run a web browser like FireFox, Safari, Chrome,
Opera or Internet Explorer. Your browser is a GUI application: it reads from the mouse and keyboard
and displays back to the user. Browsers use sophisticated graphics libraries, some of which are highly tailored
toward doing browsing.
More important, however, is the role the browser plays in the overall application. A browser application
connects you with a web server. When you request a web page (by typing the URL or clicking on a link),
your browser makes a request from a web sever. When you ll in a form and click a submit (or search or
buy now) button, you are making a request of a web server.
Writing a web application means putting the right programming on a web server. Web programming happens
in a variety of forms, and uses a number of dierent languages. The reason for the complexity of web
applications is to spread out the workload and allow a large number of people to make requests and eciently
share the web server.
The core of web applications is the HTML language. When you make a web request, the reply is almost
always a page of HTML. Your web browser opens a kind of le called a socket. The browser writes the
request, and then reads the reply. The reply will be HTML which is rendered and presented as page of
content.
Serving Web Content. On the other side of the web transaction, the web server is waiting for requests
from browsers. The server reads the request, locates the content, and sends the HTML page to the browser.
The browser will also request the various pieces of media (graphics, sounds, etc.), which are sent separately.
Some HTML pages are static, which means that the web server takes an HTML le from the disk and sends
it through the internet to your browser. This job is very simple and easily standardized. A program named
Apache httpd handles this job very nicely.
Some HTML pages are dynamic, which means that some program created customized HTML, and sent this
through the internet to your browser. Often, this program will be a partner with Apache httpd. Generally,
youll simplify your life by using a web framework for this kind of programming.
File operations you might use in a web program.
Create a file object, given the name of a disk le that exists on the web server.
Read les that are located on a web server.
362
Open a le that connects to yet another webserver and get data from another server to prepare data
for presentation on a web page.
You dont have access to the users computer or anything on the users computer; only the browser can do
that. All of your le operations are conned to your web server. You can, through HTML, make it easy for
someone to download les to their desktop computer, but you have no direct access.
The general approach is to use any of the Python web-frameworks. You can research Django, TurboGears,
Quixote and Zope to see a spectrum of just a few alternatives. There are dozens of frameworks to help you
manage these popular kinds of applications.
Embedded Control Applications. Lets imagine that we are inventing a new kind of heat pump controlled
by a computer. Weve bought our heating and refrigeration coils, weve got a reversing valve and a variablespeed motor. Weve rigged up a working set of hardware in our garage, but we need a computer and software
to control all of this hardware.
Well need to create interfaces that transform information from the outside world like temperature, pressure,
valve position, motor speed into electronic signals the computer can read. Well also need to transform
electronic signals into actions like starting a motor or changing a value position. We need to purchase and
congure the necessary computer parts. We also need to write device drivers.
Our device drivers are the glue that connects our le system to our temperature probes, coolant pressure
sensors, valve position sensor and motor speed indicator. Each of these devices can appear as a le. When
we read from the temperature le, for example, our driver uses this request to gather information from the
thermistor, encode that as a number, and provide this number to our program.
While theres a large amount of computer engineering involved, you will still use some standard le operations.
You will create a file object, given the name of a device which appears as a le. You will read or write
data using that le object.
363
364
CHAPTER
THIRTEEN
a mutable collection may have method functions to change the collection by adding and removing
elements.
A really complex type, like a le, has many attributes, some of which come from outside the Python
environment. Attributes include a name, a modication time, a size, permissions. A le is associated
with operating system resources, and a les operations will move data to or from external devices.
Each object is an instance of a class. A class denes the attributes and operations of each object that is a
member of the class. Well use the word type and class interchangeably.
A typical program will written as a number of class denitions and a nal main function. The main functions
job is to create the objects required to perform the job of the program. The programs behavior is the result
of interactions among these objects. This parallels the way that a business enterprise is the net eect the
interactions among the people who purchase materials, create products, sell the products, receive payment
and manage the nances.
Complex (complex). These are a pair of oating-point numbers of the form (a + bj), where a
is the real part and b is the imaginary part. These values have a number of operations, including
arithmetic operations and comparison operations.
Sequence. The sequence types are collections of objects identied by their order or position, instead
of a key. All sequences have a few operations to concatenate and repeat the sequence. Sequences have
in and not in operations to determine if an item is part of the sequence. Additionally sequences have
the [] operation which selects an item or a slice of items.
Immutable sequences are created as need and can be used but never changed.
* String (str). A string is a sequence of individual ASCII characters. Strings have a number
of operations that return facts about the string or transform the string and create a new
string.
* Unicode (unicode). A Unicode string is a sequence of individual Unicode characters.
Unicode strings have a number of operations that return facts about the string or transform
the string and create a new string.
* Tuple (tuple). A tuple is a sequence of Python items. It has a few operations for accessing
individual items in the tuple.
Mutable sequences can be created, appended to, changed, and have elements deleted.
* List (list). A list is a sequence of Python items. Operations like append() and pop() can
be used to add or remove from a lists. Operations like sort() can change the order of the
list.
Set. A set is a simple collection of objects. There is no ordering or key information. This makes them
very ecient. Sets have add() and remove() operations, as well as in and not in operations.
Mapping. A mapping is a collection of objects identied by keys instead of order.
Dictionary (dict). A dictionary is a collection of objects (values) which are indexed by other
objects (keys). It is like a sequence of key:value pairs, where keys can be found eciently.
Any Python object can be used as the value. Keys have a small restriction: mutable lists and
other mappings cannot be used as keys. Dictionaries have the [] operation to select an element
from the dictionary. Dictionaries have methods like has_key() to determine if a key is present
in the dictionary. Dictionaries also have methods like items(), keys() and values() to produce
sequences from the contents of the dictionary.
Default Dictionary. We had to import this from the collections package. The defaultdict
behaved just like a dictionary in every respect but one. When we attempt to get a value thats
not in the dictionary, it evaluates a default function.
Callable. When we create a function with the def statement, we create a callable object. There
are a number of attributes; for example, the __name__, and func_name attributes both have the
functions name. There is one important operation, calling the function. That is, performing the
eval-apply cycle (see The Evaluate-Aply Rule for a review) to the functions argument values.
File (le). Python supports several operations on les, most notably reading, writing and closing.
Python also provides numerous modules for interacting with the operating systems management of
les.
367
As with other real-world things, its easier to provide a lot of examples than it is to work up an elaborate,
legalistic denition. Objects are like art: I cant dene it, but I know what I like. As hard as it is, well give
the denition a whirl, because it does help some people write better software.
Each object encapsulates both data and processing into a single denition. Well sometimes use synonyms
and call these two facets structure and behavior, attributes and operations or instance variables and method
functions. The choice of terms depends on how philosophical or technical were feeling. The structure and
behavior terms are the most philosophical; the attribute and operation terms are generic object-oriented
design terms. Instances variables and method functions are the specic ways that Python creates attributes
and operations to reect structure and behavior.
In Python, we can understand objects by looking at a number of features, adapted from [Rumbaugh91].
Identity. An object is unique and is distinguishable from all other objects. In the real world, two
identical coee cups occupy dierent locations on our desk. In the world of a computers memory,
objects can be identied by their address. Unless we do something special, the built-in id() function
gives us a hint about the memory location of an object, revealing the distinction between two objects.
We can see this by doing id("abc"), id("defg"), which shows that two distinct objects were being
examined.
State. Many objects have a state, and that state is often changeable. The objects current state is
described by its attributes, implemented as instance variables in Python.
Our two nearly identical coee cups have distinguishing attributes. The locations (back-left corner of
desk, on the mouse pad) and the ages (yesterdays, todays) are attributes of each cup of coee. I can
change the location attribute by moving a cup around. Even if both cups are on the back-left corner,
the cups have unique identity and remain distinct. I cant easily change the age; todays coee remains
todays coee until enough time has passed that it becomes yesterdays coee.
In software world, my two strings ( "abc" and "defg") have dierent attribute values. Their lengths
are dierent, they respond dierently to various method functions like upper() and lower().
As a special case, some objects can be stateless. While most objects have a current state, it is possible
for an object to have no attributes, making it like a function. Such objects have no hysteresis no
memory of any previous actions.
Behavior. Objects have behavior.
terminology, its method functions.
other objects, and dont do much
do considerable processing. These
methods.
A coee cup really only has a few behaviors: it admits additional coee (to a limit), it stores a nite
amount of coee, and coee can be removed. Coee cups are passive and dont initiate these behaviors.
The coee machine, however, is an active object. The coee machine has a timer, and can perform its
behavior of making coee autonomously.
String objects have a large number of behaviors, dened by the method functions, many of which we
looked at in Sequences of Characters : str and Unicode. All of our collection classes can be considered
as passive objects.
Classication. Objects with the same attributes and behavior belong to a common class. Both of our
string objects ("abc" and "defg") belong to a common class because they have the same attributes (a
string of characters) and the same behavior.
Inheritance. A class can inherit operations and attributes from a parent class, reusing common
features. A superclass is a generalization. A subclass overrides superclass features or adds new features,
and is a specialization.
368
Both of our coee cups are instances of cup, which is a subclass of a more general class, drinking
vessel. This more general class includes other subclasses like glassware and stemware.
When we described the string data type, we put it into a broader context called sequence and emphasized the common features that all sequence types had. We also emphasized the unique features that
dened the various subclasses of sequence. All of the sequence types have the [] operator to select an
individual item. Only strings, however, had an upper() method function. Only lists had the append()
method function.
Polymorphism. A general operation, named in a superclass, can have dierent implementations
in the various subclasses. We saw this when we noted that almost every class on Python has a +
operation. Between two oating-point numbers the + operation adds the numbers, between two lists,
however, the + operation concatenates the lists. Because objects of these distinct classes respond to a
common operator, they are polymorphic.
Program Design. Up to this point in our programming career, weve been looking at our information
needs and the available Python structures. If it was a temperature, we used a number; for the color of a
space on the Roulette wheel, we used a string. In the case of something more complex, like a pair of dice,
we used a function which created a tuple.
As we become more sophisticated, we begin to see that the various types of data that are built-in to Python
arent exactly what we need. It isnt possible to foresee all possible problems. Similarly, it isnt possible to
predict all possible kinds of data and processing that will be required to solve the unforeseeable problems.
Thats why Python lets us dene our own, brand-new types of data.
Class Denition. Python permits us to dene our own classes of objects. This allows us to design an
object that is an exact description of some part of our problem. We can design objects that reect a pair of
dice, a Roulette wheel, or the procedure for playing the game of Craps. A class denition involves a number
of things.
The name of the new class.
An optional list of any classes that are the basis for this class denition. If there are any, we call these
other classes the superclasses for our new class. Generally, well use the class object as the superclass
for our class denitions.
All of the method functions for this new class. Each method is, in eect, another function of this class.
Dening a method function, is just like dening a function, and involves three things.
The name of the method function.
A list of zero or more parameters to this function. In order to identify the specic object instance,
all method functions have one mandatory parameter.
A suite of statements for this method function.
The objects attributes (also called instance variables) are not formally dened as part of the class. They
are generally created by a special method function that is executed each time an object is created. This
initialization method function is allocated responsibility for creating the objects instance variables and
assigning their initial values.
Object Creation. After we dene the class, we can create instances of the class. Every object is in instance
of one of more classes. Each object will have unique identity; it will have a distinct set of instance variables;
it will be identied by a unique object identier. Objects have an internal state, dened by the values
assigned to the objects instance variables. Additionally, each object has behavior based on the denitions
of the method functions. An object is said to encapsulate a current state and a set of operations.
Because every object belongs to one or more dened classes, objects share a common denition of their
attributes and methods. The class denition can also specify superclasses, which helps provide method
functions. We can build a family tree of classes and share superclass denitions among a variety of closelyrelated subclasses.
13.1. Objects: A Retrospective
369
It helps to treat each class denition as if the internal implementation details where completely opaque. A
class should be considered as if it were a contract that species what the class does, but keeps private all
of the details of how the class does it. All other objects within an application should use only the dened
methods for interacting with an object. When we use a lists append() method, we know what will happen,
but we dont know precisely how the list object adds the new item to the end of the list. Unlike Java
and C++, Python has a relatively limited mechanism for formalizing this distinction between the dened
interface and the private implementation of a class.
Life Cycle of an Object. Each object in our program has a lifecycle. The following is typical of most
objects.
Denition. The class denition is read by the Python interpreter or it is built-in to the language.
Class denitions are created by the class statement. Examples of built-in classes include les, strings,
sequences, sets and mappings. We often collect our class statements into a le and import the class
denitions to a program that will use them.
Construction. An object is constructed as an instance of a class: Python allocates memory that it
will use for tracking the unique ID of the object, storing the instance variables, and associating the
object with the class denition. An __init__() method function is executed to initialize the attributes
of the newly created instance.
Access and Manipulation. The objects methods are called (similar to function calls we covered in
Better Arithmetic Through Functions) by client objects, functions or scripts. There is a considerable
amount of collaboration among objects in most programs. Methods that report on the state of the
object are sometimes called accessors; methods that change the state of the object are sometimes called
manipulators.
Garbage Collection. Eventually, there are no more references to this instance. For example, consider
a variable with an object reference which is part of the body of a function. When the function nishes,
the variable no longer exists. Python detects this, and removes the object from memory, freeing up
the storage for subsequent reuse. This freeing of memory is termed garbage collection, and happens
automatically. See Garbage Collection for more information.
Important: Class and Instance
Once weve dened the class, we only use the class to make individual objects. Objects instances of a class
do the real work of our program.
When we ask a string to create an upper case version of itself ("hi mom".upper()), we are asking a specic
object ("hi mom") to do the work. We dont ask the general class denition of string to do this. The meaning
of str.upper() isnt very clear.
This can be a little mystifying when we start to dene our own classes. The problem usually stems from
confusing class denitions with function denitions. We dont use instances of a function for anything, we
use the function itself. Functions, consequently, are a bad model of how class denition works. Classes are
a kind of factory for creating objects. Objects do the real work.
The most important examples to keep in mind are string objects, le objects and list objects. These are the
most typical examples of the kinds of objects well create. Each string (or le or list) object is an instance
of the respective class denition.
Under the hood, the denition of a class creates a new class object. This class object is used to create
the instance objects that do the work of our program. The class object is mostly just a container for the
suites of statements that dene of the method functions of a class. Additionally, a class object can also own
class-level variables; these are, in eect, shared by each individual object of that class. They become a kind
of semi-global variable, shared by objects of a given class.
370
Garbage Collection
It is important to note that Python counts references to objects. When object is no longer referenced,
the reference count is zero, the object can be removed from memory. This is true for all objects,
especially objects of built-in classes like String. This frees us from the details of memory management.
When we do something like the following:
s= "123"
s= s+"456"
371
Look at some of your earlier exercises in Organizing Programs with Function Denitions. Identify the
life-span of all objects created by a specic function.
372
class className :
suite of method defs
The className is the name of the class. This name will be used to create new objects that are instances
of the class. Traditionally, class names are capitalized and class elements (variables and methods) are not
capitalized.
Well generally provide a superclass of object in Python 2. This provides some benets that while
important are also beyond the scope of the book.
The suite of defs is a series of denitions for the method functions of the class. This is indented within the
class denition.
The suite of defs can contain any Python programming. Generally, we try to limit our class denition to the
following things:
A comment string (often a triple-quoted string) that provides basic documentation on the class. This
string becomes a special attribute, called __doc__. It is available via the help() function.
Method function denitions.
Sometimes, we may provide class-wide constants variable denitions that provide a handy short
hand name for a value that doesnt change.
The heart of the class denition is the suite of method function denitions.
def
, ...
):
This denition looks just like a function denition, with two exceptions.
First, its indented within the class statement suite.
Second, each of the method functions must have a rst positional argument, self, which Python uses to
manage the unique object instances. When referring to any method function or instance variable of the
class, the instance qualier self. must be used.
Example Denition. Heres an example of a class with a single method denition. This class models a
real-world object, a die. Note the indentation of the class denition suite. Each method function def begins
at one level of indentation; each method functions suite is at a second level of indentation.
die.py
1
2
3
4
5
6
7
8
9
#!/usr/bin/env python
import random
class Die( object ):
"""Simulate a 6-sided die."""
def roll( self ):
"""Return a random roll of a die."""
u= random.randrange(6)
self.value= u+1
return self.value
373
3. We dened the simple class named Die. The indented suite contains the docstring and the single
method function of this class.
4. The docstring has a pithy summary of the class. As with function docstrings, the class docstring is
retrieved with the help() function.
5. We dened a single method function: roll(). The instance variable, self, provides access to any
variables that are created in an instance of this class. All functions that are part of a class are provided
with the instance reference as the rst positional parameter.
7. When this method is executed, is sets a local variable, u, to a random value between 0 and 5. Since
this variable has no instance qualier, it is local to the function and will vanish when the function
nishes.
8. The next statements sets a instance variable, self.value, to the random value plus 1. Since self.value,
is qualied by the instance, the variable is part of the state of an object and lives as long as the object
does.
The processing steps are silly, but they shows the dierence between a local variable, u, that doesnt live
with the object, and an instance variable, self.value, that denes the state of the object.
Tip: Debugging a Class Denition
When we get syntax errors on a class denition, it can be in the class line or one of the internal method
function denitions.
If we get a simple SyntaxError on the rst line, we have misspelled class, left o a ( or ), or omitted the
: that begins the suite of statements that denes the class.
If we get a syntax error further in the class denition, then our method functions arent dened correctly.
Be sure to indent the def once (so it nests inside the class). Be sure to indent the suite of statements inside
the def twice.
374
4
5
6
7
8
9
10
3. We use our Die class to create two variables, d1, and d2; both are new objects, instances of Die.
5. We evaluate the roll() method of d1; we also evaluate the roll() method of d2. Each of these calls
sets an objects value variable to a unique, random number. Theres a pretty good chance (1 in 6)
that both values might happen to be the same. If they are, simply evaluate d1.roll() and d2.roll()
again to get new values.
We print the value variable of each object. The results arent too surprising, since the value attribute
was set by the roll() method. This attribute will be changed the next time we evaluate the roll()
method.
9. We also ask for a representation of each object. Unless we provide a method named __str__() in
our class, this is what Python reports. Note that the numbers are dierent, indicating that these are
distinct objects, each with private instance variables.
Note that we used the class denition to make two objects, d1, and d2. The objects are the focus of our
program. We have manipulators (like the roll() method) and accessors (the value attribute) for these
objects.
Tip: Debugging Object Construction
Assuming weve dened a class correctly, there are a three of things that can go wrong when attempting to
construct an object of that class.
The class name is spelled incorrectly.
Youve omitted the () after the class name. If we say d= Die, weve assigned the class object, Die, to
the variable d. We have to say d= Die() to use the class name as a factory and create an instance of
a class.
Youve got incorrect argument values for the parameters of the __init__().
If we get a NameError: name 'Hack' is not defined, then the class (Hack, in this example) is not actually
dened. This could mean one of three things: our class denition had errors in the rst place, our denition
class name isnt spelled the same as our object creation (either we spelled it wrong when dening the class,
or spelled it wrong when using the class to create an object.) The third possible error is that we have dened
the class in a module, imported it, but forgot to quality the class name with the module name.
If our class wasnt dened, it means we either forgot to dene the class, or overlooked the SyntaxError when
dening it. If our class has one name and our object constructor has another name, thats just carelessness;
pick a name and stick to it. If we are trying to import our denitions, we can either qualify the names
properly, or use from module import * as the import statement.
Another common problem is using the class name without ()s. If we say d= Die, weve assigned the class
object (Die) to the variable d. We have to say d= Die() to create an instance of a class.
If weve dened our class properly, we can get a message like TypeError: __init__() takes exactly 2
arguments (1 given) when we attempt to construct an object. This means that our __init__() method
function doesnt match the object construction call that we made.
The __init__() function must have a self parameter name, and it must be rst. When we construct an
object, we dont provide an argument value for the self parameter, but we must provide values for all of the
375
376
The attributes, however, do not have formal denitions. Each objects attributes are implemented through
instance variables, which like all Python variables are created as needed by an assignment statement.
In order to guarantee that all of the instance variables exist during the entire life of the object, it is best
to initialize them by providing a method with the special name of __init__(). The __init__() method is
always called automatically by Python when the object is created; we can exploit this to assure a correct
initialization.
In this example, we updated our Die to add an __init__() function. This function will provide a default
value for the self.value attribute.
die.py, version 2
1
2
3
4
5
6
7
8
9
10
import random
class Die( object ):
"""Simulate a 6-sided die."""
def __init__( self ):
"""Initialize the die."""
self.value= None
def roll( self ):
"""Return a random roll of a die."""
self.value = random.randrange(6) + 1
return self.value
Bonus Questions. In the rst version of Die, what would happen if we did the following?
dx = Die()
print(dx.value)
dx.roll()
print(dx.value)
Compare this with what happens when we do this with the new version of Die. Which class has better
behavior?
Arguments to Control Initialization. Method functions can have parameters. All of the techniques
weve seen for ordinary function denitions apply to method functions. We can have additional positional
parameters after self, keyword parameters, default values, as well as the * and ** collections of additional
parameters.
As with all method functions, the __init__() method function can accept parameters. This allows us to
correctly initialize an object at the same time we are creating it. The object can begin its life in a specic
state. Since we dont call the __init__() function directly, this raises a question. How are argument values
assigned to the parameter variables?
The class name becomes a factory function that makes new instances of the class. When we evaluate the
class, using ()s, we can pass argument values to the class factory. The argument values we give to the class
factory are given to the __init__() method function.
For any class, C, if we say a= C( some values ), Python acts as though we said
a= C()
a.__init__(
some values )
Example Class Denition. This next example is a class that denes a geometric point. The class provides
some operations that manipulate that point. When we create a Point instance, well provide an x and y
coordinate. To dene the point (x,y)=(3,2), we could say Point(3,2). This would, in eect, do the following
for us p= Point(); p.__init__( 3, 2 ).
377
Heres an example of creating a Point at coordinates (2,3) via Point(2,3) and then manipulating that point.
First we move it -1 unit on the x axis and 2 units on the y axis. Then we move it -2 on both axis.
>>> from point import Point
>>> p = Point(2,3)
>>> print(p)
<point.Point instance at 0x98d148>
>>> print(p.x, p.y)
2 3
>>> p.offset( -1, 2 )
>>> print(p.x, p.y)
1 5
>>> p.offset2( -2 )
>>> print(p.x, p.y)
-1 3
After using the offset() and offset2() manipulations, the point is now at (-1,3).
Other Special Names. In addition to the specially-named __init__() method, there are many other
specially-named methods that are automatically used by Python; these special methods can simplify our
programming. After __init__(), the next most important special method function name may be __str__().
The __str__() method is used to return the string representation of the object. For example, we can add
this method to our Point class to return an easy-to-read string for a Point.
class Point( object ):
# ...other methods...
def __str__( self ):
return "({0:d},{1:d})".format(self.x, self.y)
Dont Forget self. Within a class, we must be sure to use self. in front of the function names as well as
attribute names. For example, our offset2() function accepts a single value and calls the objects offset()
function using the supplied value for both x and y osets.
378
In this case, we created an object, d1, which is dened by the Die class. When we say Die(), we are creating
a new object, and implicitly evaluating Die.__init__() to initialize that object.
After creating an instance of Die, we then evaluated the roll() method of that instance. This method
updates the instance variables, self.value, with a new random number. It also returns the value of the
instance variable.
A method which returns information without changing any of the instance variables is sometimes called an
accessor. A method which changes an instance variable is sometimes called a manipulator.
Tip: Debugging Class vs. Object Issues
Perhaps the biggest mistake newbies make is attempting to exercise the method functions of a class instead
of a specic object. You cant easily say Die.roll(), youll get the cryptic TypeError: unbound method
error message. The phrase unbound method means that no instance was being used.
When you say d1= Die(), you are creating an instance. When you see d1.roll(), then you are asking that
specic object to do its roll() operation.
379
import random
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
380
26
27
1. We import the random module. The Die class will collaborate with the random module to simulate a
single die.
3. We dene Die. See die.py, version 2.
6. We dene Dice, which will collaborate with Die to simulate a pair of dice.
8. The __init__() method creates an instance variable, myDice, which has a tuple of two instances of
the Die class. The __init__() method is often called a constructor.
11. The roll() method changes the overall state of a given Dice object by changing the two individual
Die objects it contains. This kind of method is often called a manipulator. It uses a for loop to assign
each of the internal Die objects to variable d. It then calls the roll() method of each Die object.
This technique is called delegation: a Dice object delegates the work to two individual Die objects.
16. The getTotal() and getTuple() methods return information about the state of the object. These
kinds of methods are often called accessors. Sometimes they are called getters because their names
often start with get.
The getTotal() method computes a sum of all of the Die objects. It uses a for loop to assign
each of the internal Die objects to d. It then access the value instance variable of each instance
of Die.
The getTuple() method returns the values showing on each Die object. It uses a list comprehension to create a list of the value instance variables of each Die object. The built-in function
tuple converts the list into a tuple.
A Function Which Uses Die and Dice. The following function exercises an instance of this Dice class
to roll two dice a dozen times and print the results.
import die
def test2():
x= die.Dice()
for i in range(12):
x.roll()
print(x.getTotal(), x.getTuple())
This function creates an instance of Dice, called x. It then enters a loop to perform a suite of statements 12
times. The suite of statements rst manipulates the Dice object using its roll() method. Then it accesses
the Dice object using getTotal() and getTuple() method.
Alternatives. The roll() method could also be written as
def roll( self ):
[ x.roll() for x in self.myDice ]
This will apply the roll() method to each Die in myDice. Interestingly it also creates a list object. Since
the roll() function doesnt return a value, this list object will actually be a sequence of None values. Since
it isnt assigned to a variable, it quietly blinks out of existence and is lost forever. So, each time Dice.roll()
is called a little list of Nones is created and removed.
The getTotal() method could also be written as
def getTotal( self ):
"Return the total of two dice."
return sum( d.value for d in self.myDice )
381
33 (s f )
t (d + 33)
Typically, you will average the SACR over a number of similar dives. You will want to create a Dive
class with start pressure, nish pressure, time and depth. Typical values are a starting pressure of
3000, ending pressure of 700 to 1500, depth of 30 to 80 feet and times of 30 minutes (at 80 feet) to 60
minutes (at 30 feet). SACRs are typically between 10 and 20. Your Dive class should have a function
named getSACR() which returns the SACR for that dive.
To make it a little simpler to put the data in, well treat time as string of HH:MM, and use string
functions to pick this apart into hours and minutes. We can save this as tuple of two integers, hours
382
and minutes. To compute the duration of a dive, we need to normalize our times to minutes past
midnight, by doing hh*60+mm. Once we have our times in minutes past midnight, the dierence is
number of minutes of duration for the dive. Youll want to create a function getDuration() to do just
this computation for each dive.
class Dive(object)
__init__(start, nish, in, out, depth)
Initialize a Dive with the start and nish pressure in PSI, the in and out time as a string, and
the depth as an integer. This method should parse both the in string and out string into time
tuples of hours and minutes. The parseTime() can be used to do this for both the in time and
the out time.
Note that a practical dive log would have additional information like the date, the location, the
air and water temperature, sea state, equipment used and other comments on the dive.
__str__()
Return a nice string representation of the dive information.
getSACR()
Compute the SACR value from the starting pressure, nal pressure, time and depth information.
The duration can be computed using the getDuration() function.
parseTime(hhmm_string)
Pick apart a HH:MM time and convert the strings to integers to produce a 2-tuple of hours and
minutes after midnight.
getDuration(in_time, out_time)
Accepts two 2-tuples of hours and minutes, normalizes these to minutes past midnight, and returns
the dierence. This is the dives duration in minutes.
Well want to initialize our dive log as follows:
log = [
Dive(
Dive(
Dive(
Dive(
]
start=3100,
start=2700,
start=2800,
start=2800,
finish=1300,
finish=1000,
finish=1200,
finish=1150,
in="11:52",
in="11:16",
in="11:26",
in="11:54",
out="12:45",
out="12:06",
out="12:06",
out="12:16",
depth=35
depth=40
depth=60
depth=95
),
),
),
),
Your application can then process a sequence of Dives, get the SACR for each dive, and compute the
average SACR over all the dives in the dive log. Heres a start on the nal program.
total= 0
for d in log:
print(d, d.getSACR())
total += d.getSACR()
print(total, len(log))
2. Stock Valuation.
A block of shares in a stock has a number of attributes, including a purchase price, purchase date, and
number of shares in the block. Commonly, methods are needed to compute the total spent to buy the
stock, and the current value of the stock. An investor may have multiple blocks of stock in a company;
this collection is called a Position.
Beyond a simple collection of shares are larger groupings. A Portfolio, for example, is a collection of
Positions; it has methods to compute the total value of all positions of stock. Well look at Position
and Portfolio in a subsequent exercise. For now, well just lock at a block of shares.
383
When we purchase stocks a little at a time, each block of shares has a dierent price. We want the
total value of the entire set of shares, plus the average purchase price for the set of shares as a whole.
First, dene a ShareBlock class which has the purchase date, price per share and number of shares.
class ShareBlock(object)
__init__(self, purchDate, purchPrice, shares)
Populate the individual instance variables with date, price and shares. Well dene another class
with the ticker symbol that can act as a container for the several of these blocks for a particular
company.
__str__(self )
Return a nice string that shows the date, price and shares.
getPurchValue(self )
Computer the purchase value as the price shares.
getSaleValue(self, salePrice)
Given a salePrice, compute the sale value using the sale price in price shares.
getROI(self, salePrice)
Given a salePrice, compute the return on investment as
We can load our database with a piece of code the looks like the following. The rst statement will
create a sequence with four blocks of stock. We chose variable name that would remind us that the
ticker symbols for all four is GM. The second statement will create another sequence with four blocks.
blockGM = [
ShareBlock(
ShareBlock(
ShareBlock(
ShareBlock(
]
blockEK = [
ShareBlock(
ShareBlock(
ShareBlock(
ShareBlock(
]
purchDate='25-Jan-2001',
purchDate='25-Apr-2001',
purchDate='25-Jul-2001',
purchDate='25-Oct-2001',
purchPrice=44.89,
purchPrice=46.12,
purchPrice=52.79,
purchPrice=37.73,
shares=17
shares=17
shares=15
shares=21
),
),
),
),
purchDate='25-Jan-2001',
purchDate='25-Apr-2001',
purchDate='25-Jul-2001',
purchDate='25-Oct-2001',
purchPrice=35.86,
purchPrice=37.66,
purchPrice=38.57,
purchPrice=27.61,
shares=22
shares=21
shares=20
shares=28
),
),
),
),
Once we have the ShareBlock class working, we can move on to processing the entire position.
3. Stock Position.
In Stock Valuation, we looked at a block of stock shares. A collection of these blocks represents a
position on that stock. We can dene an additional class, Position, which will have an the name,
symbol and a sequence of ShareBlocks for a given company.
class Position(object)
__init__(self, name, symbol, block_list)
Accept the company name, ticker symbol and a collection of ShareBlock instances.
384
__str__(self )
Return a string that contains the symbol, the total number of shares in all blocks and the total
purchase price for all blocks.
getPurchValue(self )
Sum the purchase value for each block.
getSaleValue(self, salePrice)
Given a salePrice, sum the sale value for each block.
getROI(self, salePrice)
Given a salePrice, compute the return on investment as
based on an overall yield.
This is an ROI
We can create our Position objects with the following kind of initializer. This creates a sequence of
three individual Position objects; one has a sequence of GM blocks, one has a sequence of EK blocks
and the third has a single CAT block.
portfolio= [
Position( "General Motors", "GM", blocksGM ),
Position( "Eastman Kodak", "EK", blocksEK )
Position( "Caterpillar", "CAT",
[ ShareBlock( purchDate='25-Oct-2001',
purchPrice=42.84, shares=18 ) ] )
]
You can now write a main program that writes some simple reports on each Position object in the
portfolio, and the overall portfolio. This report should display the individual blocks purchased. This
should be followed with a total price paid, and then the overall average price paid (the total paid
divided by the total number of shares).
4. Statistics Library.
We can create a class which holds a sequence of samples. This class can have functions for common
statistics on the objects sequence of samples.
For additional details on these algorithms, see the exercises in Doubles, Triples, Quadruples : The
tuple and the exercises in Common List Design Patterns.
class Samples(object)
__init__(self, sequence)
Save a sequence of samples in an instance variable. It could, at this time, also precompute a
number of useful values, like the sum, count, min and max of this set of data.
__str__(self )
Return a summary of the data. An example is a string like "%d values, min %g, max %g,
mean %g" with the number of data elements, the minimum, the maximum and the mean.
mean(self )
Return the sum divided by the count.
min(self )
Return the smallest value in the sequence of data values.
max(self )
Return largest value in the sequence of data values.
variance(self )
The variance() is a more complex calculation. For each sample, compute the dierence
between the sample and the mean, square this value, and sum these squares. The number of
13.2. Dening New Objects
385
samples minus 1 is the degrees of freedom. The sum, divided by the degrees of freedom is the
variance.
stdev(self )
Return the square root of the variance.
mode(self )
The mode() returns the most popular of the sample values. The following algorithm can be
used to locate the mode of a set of samples.
Computing the Mode
386
CHAPTER
FOURTEEN
387
cover the import statement that lets us use a module. Well look at variations on the import statement in
Some Variations On the import Statement.
A module has some additional layers of meaning that well touch on in Thinking In Modules, and the
Declaration of Dependence. Well look at the technique of abstraction, again, and show how this applies to
designing modules in Dividing and Conquering The Art Of Design. This chapter ends with some style
notes in Style Notes and some FAQs in Module FAQs.
388
#!/usr/bin/env python
2. Docstring Lines. The second line of a module le should be a triple-quoted string that denes the
contents of the module le. As with other Python doc strings, the rst line of the string is the pithy
summary of the module. This is followed by a more complete denition that describes what the module
does and how we should use it.
"""die.py - basic definitions for Die and Dice
class Die defines a single 6-sided die.
class Dice defines a pair of 6-sided dice.
functions test1 and test2 perform simple sanity checks of the module.
"""
3. From Future Lines. If the module needs any future features, there might be a from __future__
import statement to introduce any Python 3 features.
from __future__ import division
389
In addition to two class denitions, which is typical, this module includes two test functions, test1() and
test2(). These can be used to assure that the various elements of the module work correctly. They also
serve as a kind of documentation for how the module should be used.
... ]
In this form, Python locates the module le, opens it, reads and evaluates the Python statements. The
result of any def, class or assignment is to build objects within the module.
After this statement is executed, the named module object is fully populated and ready for use.
For example.
>>> dir()
['__builtins__', '__doc__', '__name__']
>>> import math
>>> dir()
['__builtins__', '__doc__', '__name__', 'math']
>>> dir(math)
['__doc__', '__file__', '__name__', 'acos', 'asin', 'atan', 'atan2', 'ceil', 'cos', 'cosh', 'degrees', 'e', 'exp',
390
For example, if a script uses import die, it must create objects of die.Die class or die.Dice class.
The module itself, internally, cant use qualied names. Within our die module, the denition of the Dice
class is in the same module as the denition of the Die class, so the names are not qualied.
Heres an example of using the die module.
>>>
>>>
>>>
>>>
4
import die
d= die.Dice()
d.roll()
print(d.total())
More About Modules and Names. In Keeping Track of Variable Names The Namespace we talked
about a function having a local namespace. A namespace keeps the variables dened within a function
separate from all other functions variables and any global variables. This namespace is created when the
function is evaluated, and disposed of when the evaluation is nished. All of the variables created inside the
functions suite of statements are silently disposed of.
When we looked at class denitions, we used a number of namespaces. A method function has a local
namespace, just like an ordinary function. Further, the self parameter denes the namespace for the objects
instance variables. To use an instance variable, we use a qualied name: self.variable provides the namespace,
a dot, and the variable from within that namespace. Each object is eectively a namespace for the objects
attributes; the namespace is created when the object is created.
When a module is imported, the module as a single object of class module is imported into the global
namespace. Each individual class, function or variable dened within the modules le becomes part of the
modules local namespace. This assures us that each module can dene classes, functions and variables without worrying about conicts with other modules. When we use a two-part name (die.Dice, or math.sqrt),
we provide the namespace rst, then the name within the namespace.
Important: Bad Behavior
Importing a module means that the module le is executed similar to a script le. This means that all of
the statements in the module are executed.
The standard expectation is that a library module will contain only denitions. Any executable statements
should be inside function denitions. Some modules create module global variables; this must be fully
documented. It is bad behavior for an imported module to attempt to do any real work beyond creating
denitions. Any additional work that a module does is unexpected and makes the module hard to use.
Python doesnt enforce this distinction between a script, which does something useful, and a library module,
which denes things that a script will use. It is purely a matter of best practice in designing a Python
modules and programs.
... ]
When we want to use a function from the math module, we must tell Python which namespace contains
the function by putting the module name and a dot (.): math.sin(0.7854), for example. This explicit
14.1. Module Denitions Adding New Concepts
391
qualication is important to everyone else wholl be reading your program. Using the module name with the
function or class name makes the origin of the object clear.
Exposing Certain Names. Another variation on import introduces selected names from the module into
the local namespace. This import has the form: ,
from module
import name [
... ]
This performs the basic module import but moves the given names into the local namespace. The selected
names can be used without the overall module namespace qualier.
For example:
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None}
>>> from math import sin, cos, tan
>>> locals()
{'cos': <built-in function cos>, '__builtins__': <module '__builtin__' (built-in)>, 'sin': <built-in function sin>,
>>> sin(0.7071)
0.64963178370469132
import
This makes all names available in the local namespace. This is only appropriate for interactive debugging.
Since this makes the origin of the name obscure, we suggest avoiding it.
Renaming A Module. Another variation on import allows you to have multiple, competing implementations for a module.
import module as newName
This retains the explicit qualication of names so we can see which module the name belongs to. But the
qualifying name is the new name.
We often use this when we have dierent variants that we want to use in a consistent manner. For example,
we may have several random number generator modules. Each module can have the same set of class and
funcion names, but dierent algorithms to implement those classes and functions.
option = "Standard" # or "Homebrew"
if option == "Standard":
import random as rnd
else:
import homebrewed_lcm as rnd
print(rnd.randrange(1,20))
392
In this case, weve imported random and named it rnd. We can easily change this to import homebrewed_lcm
as rnd. This gives us a way to make a very easy switch between alternative implementations.
We often use this when we have dierent variants that we want to use in a consistent manner. This is often
done for database modules. Often a number of databases will all use the dbapi interface. This allows a
single application program to work with any one of a number of compatible database modules.
We also use this technique for XML or JSON le parsers; we might have several alterantive XML parsers.
We can use the import as statement to select which variant we want.
Finally, we can also use this to create a handy short-hand for a long module name. This can happen when
you have packages of modules, and the package path names get long. This lets you keep your programming
short by using an abbreviation for a longer, more precise module name.
393
there will be processing details that dont really matter to someone who uses the module. In some cases,
the details may be just a function denition. In other cases, the details may be a number of complex class
denitions.
A module reects some knowledge about data and processing. Our module must reect a tidy, easy-toexplain bundle of concepts. This is the coherence or conceptual integrity principle. A module isnt a
jump of stu in one le. Its reects concepts we use to simplify our programming.
Knowledge Capture. A program in Python represents knowledge. We have a spectrum of Python programming concepts from the very ne-grained statements to the very inclusive packages of modules and
application programs. This spectrum includes the following ways of composing and grouping knowledge.
Statement. A statement makes a specic state change. The assignment statement will update
variables in our Python environment. We use other statements (like if statements or while statements)
to choose precisely which assignment statements get executed.
Function. A function groups a suite of statements to compute a specic result or perform a specic
task. Functions are designed to be an indivisible, atomic unit of work. Functions can be easily
combined with other functions, but never taken apart. A a well-chosen function name claries and
denes a single, useful concept.
Class. A class groups a set of related functions and the private data elements they share. The
class represents several closely related tasks, always with a narrowly dened responsibility. Classes
may be simple, perhaps only a single function or attribute, or a complex collection of attributes and
method functions. The intent is to clearly delineate responsibility for maintaining data or performing
an algorithm.
Module. A module is a group of any Python denitions, including variables, classes and functions.
A module should provide a closely related set of one or more class denitions and related convenience
functions and objects. The conceptual integrity of a module is its central feature. We put things into
a module because they are closely related.
Package. A package is a group of modules: the directory structure of the package is the directory
that contains all the packages. Additionally, packages contain some additional les that Python uses
to locate all of the elements of the package.
Application. The application is the top-most executable script that a user invokes when they want
to do useful work. There is a relationship between the commands that a user sees and the packaging
of components that implement those commands. This relationship reveals two dierent concerns:
usability and maintainability. The application-level view the command or GUI presented to the user
should be focused on usability. The design of modules and packages is focused on the technical
concerns of maintenance and adaptability.
Import Processing The Declaration of Dependence. When we write a script that imports a module,
we are making a formal Declaration of Dependence. We are saying that our script depends on another module.
Python does several things to honor this dependency.
1. If the module is known, this means that it has already been imported, and nothing more needs to be
done. This has the pleasant consequence of allowing a complex program with many modules to be
very casual about the order of the various import statements.
An import statement is really only executed the rst time it is seen. Every other time, its just a
Declaration of Dependence.
2. If the module is not known, it needs to be imported.
(a) The path is searched to nd the module le. If it cannot be found, an exception is raised. Well
return to the path concept later.
(b) Once found, the module le is read and executed. All of the resulting variables, functions and
classes are part of the modules namespace.
394
(c) The import statement can do some optional processing to assure that module elements are
created in the global namespace. This is the from module import ... kind of import.
Primarily, an import statement is a declaration of the support a module requires. In a more complex
application, there may be several libraries, each with its own collection of imports. This can create a web
of dependencies among the various components.
Additionally, a module import is a form of execution, just like executing a script. A module, therefore, can
do processing as well as dene functions and classes. It is best if any processing is carefully segregated from
the denitions. Thats why we make a rm distinction between library modules and main programs.
File System Locations. For modules to be available for use, the Python interpreter must nd the module
le on the module search path. When you provide an import statement, Python searches each of the
directories on the path for the named module.
You can see Pythons understanding of the path by importing the sys module and looking at the variable
sys.path. This variable lists all the places Python will look for a module. There are several consequences to
this use of sys.path to contain a list of directories.
Heres a Windows search path. Note that Windows uses \ as a le path separator, and Python strings use
\ as an escape character, which leads to using \\ as the Python language representation of a single \. This
sys.path lists more than a dozen locations where modules can be found.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
This list shows the built-in parts of Python, the add-ons Ive downloaded, and all of the various projects
that Im using this computer for. Lets examine the search path to see where things will be found.
4. '' is the rst location. The empty string stands for your current working directory.
5. The next locations (lines 5 to 12) are Python packages provided as .egg les. These are all tools that
I downloaded and added to my Python environment.
13. This is a local project. It was included via the PYTHONPATH environment variable.
14. The next eight locations are the essential ingredients of Python itself.
...python26.zip and ends with ...python2.6/site-packages.
395
23. The next two locations are Python packages that are not provided as .egg les. These are also tools
that I downloaded and added.
Setting Your Path. To be sure a module can be imported, you have to be sure that the le is found on
Pythons sys.path. To do this, you have to do any one of the following things.
Put your module into the directory that Python sees as the current working directory. This works
great for learning, but in the long run, you dont want to have your working les and your program
all piled together in the same directory. Eventually, youll want to install your programs in someplace
more permanent, and separate from your working data les.
When you are working in IDLE, IDLE will put a modules working directory in the path when you
load or run the module.
Put your module into an existing library of modules. Python has a directory called Lib/site-packages
in which you can put your own modules. This is usually associated with the Python installation
directory. See Let There Be Python: Downloading and Installing for more information. In the example
above, many modules were located in my le:site-packages directory.
Most Python modules and packages include a setup.py le which will properly install the module into
le:site-packages directory.
Extend the list of directory paths by putting a .pth le in the site-packages directory. A .pth le
is a one-line le that provides the directory location for a given module.
Extend the list of directory paths to include the directory for your module by changing the Python
environment. In GNU/Linux, the
PYTHONPATH environment variable can be used to dene the directories expected to contain modules.
In the Windows environment, the Python_Path symbol in the Windows registry is used to locate
modules as well as the
PYTHONPATH environment variable.
Windows. We use a command like SET PYTHONPATH pathtomodule to name directories that
Python should also search for modules. Separate each directory name with ;.
All Other OS. We use a command like export PYTHONPATH=path/to/module to name
directories that Python should also search for modules. Separate each directory name with :.
See the sidebar, Debugging Imports, for more information on determining why your module wont
import.
Since the sys.path object is a list, there are two important consequences.
Lists are mutable. Your program can add directories to this list. This can be confusing and hard to
maintain. Some applications do this, however, to provide a very high degree of exibility. You may
read someone elses program which updates sys.path. It isnt the best policy.
Lists are ordered. If a module occurs in two directories, Python will locate the one thats rst in the
search path. You can use this to create a test version in your local directory which is loaded before
the released version in your site-packages directory.
For now, we can put our modules into the same directory as our main script. When we open a le in IDLE,
that le open will also changes what IDLE sees as the current working directory. Keeping a script and the
related modules in one directory is the minimum we need to do to assure that our script can import our
modules.
Tip: Debugging Imports There are four things that can go wrong with an import: (1) the module cant
be found; (2) the module isnt valid Python; (3) the module doesnt dene what you thought it should dene;
(4) the module name isnt unique and some other module with the same name is being found.
396
Be sure the modules .py le name is correct, and its located on the sys.path. Module lenames are
traditionally all lowercase, with minimal punctuation. Some operating systems (like GNU/Linux) are casesensitive and a seemingly insignicant dierence between Random.py and random.py can make your module
impossible to nd.
The two most visible places to put module les are the current working directory and the Python
Lib/site-packages directory. For Windows, this directory is usually under C:\python26\. For
GNU/Linux, this is often under the /usr/lib/python2.6/ directory. For MacOS users, this will be in
the /System/Library/Frameworks/Python.framework/Versions/Current/ directory tree.
If your module isnt valid Python, youll get syntax errors when you try to import it. You can discover the
exact errors by trying to execute the module le using the F5 key in IDLE.
If the module doesnt dene what you thought, there are two likely causes: the Python denitions are
incorrect, or youve omitted a necessary module-name qualier. For example, when we do import math
everything in that module requires the math qualier. Within a module, however, we dont need to qualify
names of other things dened in the same module le.
If your Python class or function denitions arent correct, it has nothing to do with the modularization. The
problem is more fundamental. Starting from something simple and adding features is generally the best way
to learn.
The sys.path is a list, which is searched in order. Your working directory is searched rst. When your module
has the same name as some extension module, your module will conceal that extension module. Ive spent
hours discovering that my module named Image was concealing PILs Image module.
#!/usr/bin/env python
"""Definition of class X and Y."""
class X( object ):
does something
class Y( X ):
does something a little different
x1= X()
x1.someMethod()
397
y2= Y()
y2.someOtherMethod()
Youll need to create two les from this. The module will be the simplest to prepare, assume the le name
is myModule.py
New Library Module
#!/usr/bin/env python
"""Definition of class X and Y."""
class X( object ):
does something
class Y( X ):
does something a little different
Your new application will look like the following because you will have to qualify the class and function
names that are created by the module.
New Application Script
#!/usr/bin/env python
"""Program which uses X and Y."""
import myModule
x1= myModule.X()
x1.someMethod()
y2= myModule.Y()
y2.someOtherMethod()
1. die.py module.
Finalize the die module with the classes Dice and Die. Write a demonstration script that rolls dice
and gathers simple statistics to show that the distribution of dice rolls is what we expect: 2.77% of the
rolls are 2s up to 16.6% which are 7s and then back down to about 2.77% which are 12s.
2. roulette.py module.
Dene a roulette module which includes the Wheel class denition. An instance of Wheel should
have a spin() method that returns a spin of the wheel, showing the number and the color. A separate
script should exercise the wheel to gather a number of spins showing how many red, how many black
and how many green (for zero or double zero).
3. divelog.py module.
The Dive Log exercise in Class Denition Exercises contains a denition of the Dive class. This should
be separated into the divelog module. A separate le can import this and use it to dene a collection
of dives and compute SACR or other statistics on the collections of dives.
4. stock.py module.
The ShareBlock and Position exercises in Class Denition Exercises contains a denition of a number
of related classes. These classes should be separated into the stock module. A separate le can import
this module and use it to dene a collection of stock positions and compute purchase value, current
value, annualized ROI or other statistics on the stocks.
398
399
The Model-View-Control pattern is often combined with the Input-Process-Output pattern. The Input
and Output are lumped into a component called Persistence, View and Control are lumped into Presentation and the Process and Model are generally two names for the same thing. This gives us the Three
Tier Architecture or Presentation-Processing-Persistence. Often the presentation is handled by a web
server using Apache, the processing is handled by an application server running Python and the persistence
is handled by a database like MySQL. When run on GNU/Linux, they call this the LAMP architecture:
GNU/Linux, Apache, MySQL and Python.
While a web applications can be complex, at the core each individual web page manages a simple transaction
which is an input-process-output pattern. A request denes the processing, input comes from the user or
les or a database, and the output is almost always a page of HTML.
Choosing Names. The bulk of most modules are class, exception and function denitions. Since the
module name implicitly qualies everything created in the module, it is never necessary to put a prex in
front of each name within a module to show its origin. This is common in other programming languages,
but never done in Python.
For example, a module that contains classes and functions related to statistical analysis might be calls
stats.py. The stats module might contain a class for tracking individual samples. This class does not
need to be called statsSample or stats_sample. A client application that contains an import stats
statement, would refer to the class as stats.Sample. Additional qualication is redundant.
The qualication of names sometimes devolves to silliness, with class names beginning with c_, function
names beginning with f_, the expected data type indicated with a letter, and the scope (global variables,
local variables and function parameters) all identied with various leading and trailing letters. This is not
done in Python programming. Class names begin with uppercase letters, functions begin with lowercase.
400
Global variables are identied explicitly in global statements. Most functions are kept short enough that
the parameter names are quite obvious.
Private Elements of a Module None of Your Business. It is common to have parts of a module
that are intentionally private to the way the module is currently written. These are things that we might
change in the future when we think of a better way to handle them. Or, they could be parts of the module
that shouldnt be tampered with.
Any element of a module with a name that begins with _single_leading_underscore is never created in the
namespace of the client module. When we use from stats import *, these names that begin with _ are
not inserted in the global namespace. While usable within the module, these names are not visible to client
modules. Theyre considered to be a private part of the implementation, not a public part of the interface.
Exceptions. A common feature of modules is to create a module-wide exception class. The usual approach
looks like the following. Within a module, you would dene an Error class as follows:
class Error( Exception ): pass
You can then raise your module-specic exception with the following.
raise Error, "additional notes"
A client module or program can then reference the modules exception in a try statement as module.Error.
For example:
from __future__ import print_function
import aModule
try:
aModule.aFunction()
except: aModule.Error, ex:
print("problem", ex)
raise
How do separate modules make my programs simpler? Isnt it more complex to break a program into piece
When you look at the various modules available, most of them are rather large. A module that you
use is full of programming you didnt do. Clearly, it saves you time.
More importantly, a module allows your program to be conceptually simpler. Rather than a big pile of
details, you can make use of the concepts in the module. You can then work with larger, more complex
and more abstract data structures. Its much easier to work with a long number than to work with a
big list of individual digits.
What possible benet is there in replacing a module? No module is perfect. A module which uses
the least memory may also be rather slow. A module which is fastest may use too much memory to
401
make your program work reliably. A really fast module may be dicult to understand and improve
on. A module thats easy to understand and improve may be too slow for your program.
Since no module can ever meet all possible performance needs, you need to be able to choose an
implementation. The Python library is full of alternative solutions to a common problem. Being able
to choose an optimal solution gives you the power to create the best possible solution to your unique
problem.
How do you draw the line on the contents of a module? You could put each function in a separate
module, or you could put everything in one module. How do you nd appropriate middle ground?
This is can be a dicult problem. The fundamental principle is the following: A Python Module is
the Unit of Reuse. When you design a module, you should be thinking of the module as a discrete
component of your overall solution.
A module will often have multiple, closely related, classes and function denitions. A module will
rarely have a single class or a single function; unless that single object can be used by itself in
several applications.
There are a number of guidelines for what makes a good module. The seminal article is D. L. Parnas
On the Criteria to Be Used in Decomposing Systems into Modules [Parnas72]. Subsequent to this,
some additional rules have been applied to this problem. First, a module should be coherent; that is,
it should be easy to summarize in a short description. Second modules should be loosely coupled; that
is, there is a well-dened interface and other modules only make use of the interface. The simpler an
interface is, the looser the coupling and the better the coherence.
3. Python displays the oating-point number in decimal notation. This means converting the binary
internal value to base ten.
You can try to use the round() to lop o this extra little bit of error. For example, you might try to use
the following. What happens?
round( 2.35, 2 )
The rounded result is still a binary number and are we sill trying to express a binary fraction using decimal
digits.
Internal Bits and Bytes.
The Python literal 2.35 becomes a oating-point value of the form m 2e . The mantissa, m 1, is a
fractiona of the form, 2n53 .
Internally, the value is eectively this.
5, 291, 729, 562, 160, 333
22
253
Which is very precise, but still not 2.35.
Handling Math. While rounding seems like a good idea, there is some sophistication required to handle
interest rates which are often in small fractions of a percent. For example, if an interest rate is 8.25%, 0.0825,
we have 4 decimal places of precision that we have to preserve. If we apply this rate to a large amount of
money, say 123,456,789.10, the precise answer has 15 digits, six of which are to the right of the decimal
point. On some computers, oating-point numbers cant represent this many digits correctly.
What to do?
Its time to look at the decimal module.
There are several important things to note about creating and using decimal numbers.
The source is always a string. This has to be done because a Python oat value, like 135.99, is
converted from Pythons language (in base 10) to the computers hardware representation (in base 2)
and some precision is lost in the process.
To avoid the Python language conversion, string literals are used.
A decimal object retains all the digits of a precise answer to any mathematical operation. In the case
of repeating fractions, there is a default upper limit of 28 digits.
A decimal object can return a new decimal object with a dierent quantization by rounding. There
are a number of rounding rules, and well look at them in detail, below.
decimal objects are considerably slower than oat objects. For the most part, slow is relative and
decimal is fast enough for everything except processing JPEG images or MP3 sound samples.
403
Heres another example, which is closely related to the stock price examples we looked at in Files, Contexts
and Patterns of Processing. Lets say that I bought 135 shares of Apple back when it was trading at $20.44.
What would I have made if I sold it at $80.25?
>>> purchase = Decimal("20.44")
>>> current = Decimal("80.25")
>>> shares = 135
>>> shares * purchase
Decimal("2759.40")
>>> shares * current
Decimal("10833.75")
>>> shares*(current-purchase)
Decimal("8074.35")
Technically, a decimal number is an immutable object. The arithmetic operations will create new numbers,
but the dont change an existing number. This means that decimal objects can be used as keys in a mapping.
Quantizing a Number. You can quantize any Decimal number to a specic number of digits before or after
the decimal place. We dont call this rounding because were not always rounding, we may be truncating.
The general term for rounding or truncating is quantizing.
404
When you quantize, you can specify a rounding rule directly. Additionally, we can provide a general rounding
rule in the decimal context.
When quantizing, you provide a decimal number which has the desired number of decimal places to the
quantize() method of a number. Heres an example.
>>> from decimal import Decimal
>>> total= Decimal( "135.99" ) * Decimal( ".075" )
>>> total
Decimal("10.19925")
>>> total.quantize( Decimal('0.01') )
Decimal("10.20")
>>> total.quantize(Decimal("0.01"),decimal.ROUND_DOWN)
Decimal("10.19")
You can see that the default context species that values are rounded. However, we can specify a specic
rounding rule as part of the quantize operation. There are a number of rounding rules.
ROUND_CEILING. This rounds all fractions to the next higher positive number. They call this rounding
towards Innity. Positive numbers will get be rounded away from zero, getting larger. Negative
numbers will get smaller in magnitude, being moved closer to zero.
ROUND_DOWN. This chops o all fractions, rounding towards zero. Positive numbers will get smaller.
Negative numbers will get smaller in magnitude, being moved closer to zero.
ROUND_FLOOR. This rounds all fractions toward the next lower negative number. This call this rounding
towards -Innity. Positive numbers will get rounded toward zero, getting smaller. Negative numbers
will get larger in magnitude, being moved away from zero.
ROUND_HALF_DOWN. This rounds o to the nearest number. A value in the middle is rounded toward
zero. When rounding a value to Decimal('1'), a value of 0.5 becomes zero.
ROUND_HALF_EVEN. This rounds o to the nearest number. A value in the middle is rounded to the
nearest even number. When rounding a value to Decimal('1'), a value of 1.5 becomes 2, where a
value of 0.5 becomes zero.
ROUND_HALF_UP. This rounds o to the nearest number. A value in the middle is rounded away from
zero. When rounding a value to Decimal('1'), a value of 0.5 becomes 1.
ROUND_UP. This chops o all fractions, rounding away from zero. Positive numbers will get larger.
Negative numbers will get larger in magnitude, being moved away from zero.
The default context uses ROUND_HALF_EVEN as the rounding rule.
decimal.getcontext().
Pennies and Dollars. The quantize method needs a Decimal object that is really only used to provide an
exponent. Some people nd that creating that Decimal object is a bit too wordy.
Specically, the expression someNumber.quantize( Decimal('0.01') ) seems to be a lot of typing for a
simple concept. Heres another approach that may be a little more clear.
We can dene a pair of useful constants for rounding to pennies or dollars.
pennies = Decimal( "0.01" )
dollars = Decimal( "1.00" )
total= Decimal("123.45") * Decimal("0.075")
final= total.quantize( pennies, ROUND_UP )
405
The quantization method appears strange at rst because we provide a Decimal object rather than the number of decimal positions. The built-in round() function rounds to a number of positions. The quantize()
method of a decimal number, however, uses a decimal number that denes the position to which to round.
If you get AttributeError: 'int' object has no attribute '_is_special', from the quantize()
function, this means you tried something like aDecimal.quantize(3). You should use something like
aDecimal.quantize(Decimal('0.001')).
406
Once in a while, we may have a function which uses a slightly dierent context from the rest of the program.
In this case, we want to save the old context, make a change, and then put the old context back into place.
We might do something like the following.
def someFunction():
with localcontext() as ctx:
ctx.rounding= ROUND_HALF_UP
ctx.precision += 2
# Do the real work
purchDate='25-Jan-2001',
purchDate='25-Apr-2001',
purchDate='25-Jul-2001',
purchDate='25-Oct-2001',
purchPrice='44.89',
purchPrice='46.12',
purchPrice='52.79',
purchPrice='37.73',
shares='17'
shares='17'
shares='15'
shares='21'
),
),
),
),
purchDate='25-Jan-2001',
purchDate='25-Apr-2001',
purchDate='25-Jul-2001',
purchDate='25-Oct-2001',
purchPrice='35.86',
purchPrice='37.66',
purchPrice='38.57',
purchPrice='27.61',
shares='22'
shares='21'
shares='20'
shares='28'
),
),
),
),
You may notice that the le-reading exercises involve reading strings from les. The class creation
exercises create stock ShareBlock or Position objects using strings. Youre now in a position to
combine le reading and object creation, along with the decimal package to do real work in Python.
14.3 Time and Date Processing : The time and datetime Modules
Too often, programmers attempt to write their own date, time or calendar manipulations. Calendar programming is very complex, and there are a number of shoals that are not clearly charted. The use of the
time and datetime modules makes our applications simpler, and much more likely to be correct. The clock
and calendar are hopelessly complex data objects, best described by these modules.
In Concepts: Point in Time and Duration, well look at the concepts of a point in time and a duration. Then
well look at the datetime module in The datetime Module. The formal denitions for some of the module
is in Formal Denitions in datetime.
14.3. Time and Date Processing : The time and datetime Modules
407
Well look at the time module in The time module, and the formal denitions in Formal Denitions in time.
The Y2K Lessons Learned
In the late 90s, programmers like me scrambled to x the cruft that had accumulated from decades of
sloppy date calculations. The whole date thousand of programs shared a number of assumptions about
dates that in retrospect were really bone-headed.
Invalid assumption one. A two-digit year was sucient. When we write 3/18/85, we only write two
digits, and we have to use context clues to gure out what the date means. In this case, if I say that its
the birthdate of my great-grandmother, youd know that the date meant 1885. A two-digit short-hand
year is ne for people, but unsuitable for computing. The century information is essential even it isnt
shown to the human user of the software.
Invalid assumption two. Ordinary professional programmers can actually understand the complex and
nuanced Gregorian calendar. One example of the complexity is the leap year rule: years divisible by 4,
except years divisible by 100, including years divisible by 400.
Invalid assumption three. The service life of software is short enough that well write new software
before January 1st, 2000. Software doesnt wear out; a well-written piece of software can be used for
decades. The authors personal best is 17 years of continuous service before the software was replaced
during a complete reworking of the business computer systems.
A huge technical mistake that compounded the problem was to embed date calculations in application
programs. The Python idea is the date calculations must be in a separate module, and any change or
x can be made in one place only and will x every program that uses the module. The real core Y2K
problem stemmed from a failure to isolate dierent concepts like date into separate modules.
The consequence of the assumptions and mistakes was side-tracking thousands of programmers. Instead
of creating useful solutions to data processing problems, they were remediating program after program
that had a date calculation somewhere inside it.
408
Sometimes we display just a time of day to a person because the person can work out the date from the
context in which information is displayed.
A point in time, unless its a generic time used for scheduling, is part of a complete datetime object. We
may show only the time of that more complete datetime object to the person using the software.
Duration. A duration is sometimes called a delta time or oset. Durations can be measured in various
units like years, quarters, months, weeks, days, hours, minutes or seconds.
Since both a point in time and a duration can be measured in similar units, this can be confusing. Durations,
for example, arent a specic time of day (10:08 AM), but a number of hours (10 hours, 8 minutes).
Irregularity. Note that we measure time in units which have gaps and overlaps, mostly because the irregular
concepts of month and year. If we only used days and weeks, life would be simpler. Months and years really
throw a monkey wrench into the works.
For example, the durations of 90 days and 3 months are similar, but not exactly the same. For the
simple durations like weeks, days, hours, minutes and seconds, the conversions are simple; we can normalize
to days or seconds without any trouble or confusion.
Among the cultural times like months, quarters and years, the conversions are pretty clear. However, there
are a lot of special cases. This makes converting back and forth between simple times and cultural times is
very dicult.
If we think of a duration being measured in days, then one hour is 1/24 = 0.04166 and one minute is 1/60th of
that = 0.0006944. On the other hand, we can think of a duration in seconds, then one day is 86,400 seconds.
Both views are equivalent. Our operating systems, generally, like to work in seconds. Other software, like
databases, prefer to work in days because that meets peoples needs a little better than working in seconds.
Point and Duration Calculations. You can combine a timedelta and a datetime to compute new point
in time. You can also extract a timedelta as the dierence between two datetimes. Doing this correctly,
however, requires considerable care. Thats why this operation is done best by the datetime package.
There are two overall approaches to date and time processing in Python.
The OS-friendly time module. This module has two dierent numeric representations for a point in
time: a time.struct_time object or a oating-point number. The module works with durations as a
oating-point number of seconds. It requires conversions between seconds and a time.struct_time
object. While low-level, this module maps directly to the portable C libraries.
The person-friendly datetime module. This module denes several classes for a point in time,
plus a class for a duration.
The datetime.date, datetime.time, datetime.datetime and
datetime.timedelta classes embody considerable knowledge about the calendar, and remove the
need to do conversions among the various representations.
Unless you have a need for C-language compatibility, you use the datetime module for all of your date and
time-related computations. Well present datetime rst.
In addition to the datetime module, the calendar module also contains useful classes and methods to
handle our Gregorian calendar. We wont look at this module, however, since its relatively simple.
14.3. Time and Date Processing : The time and datetime Modules
409
convert among the worlds calendars. For details on conversions between calendar systems, see Calendrical
Calculations [Dershowitz97]. Additionally, this package also provides for a time delta, which captures the
duration between two datetimes.
One of the ingenious tricks to working with the Gregorian calendar is to assign an ordinal number to each
day. We start these numbers from an epochal date, and use algorithms to derive the year, month and day
information for that ordinal day number. Similarly, this module provides algorithms to convert a calendar
date to an ordinal day number. Following the design in [Dershowitz97], this class assigns day numbers
starting with January 1, in the (hypothetical) year 1. Since the Gregorian calendar was not dened until
1582, all dates before the ocial adoption are termed proleptic. This epoch date is a hypothetical date that
never really existed on any calendar, but which is used by this class.
There are four classes in this module that help us handle dates and times in a uniform and correct manner.
datetime.date An instance of datetime.date has three attributes: year, month and day. There are a
number of methods for creating datetime.dates, and converting datetime.dates to various other
forms, like oating-point timestamps and time.struct_time objects for use with the time module,
and ordinal day numbers.
datetime.datetime An instance of datetime.datetime has all the attribute for a complete date with the
time information. There are a number of methods for creating datetime.datetimess, and converting
datetime.datetimess to various other forms, like oating-point timestamps and time.struct_time
objects for use with the time module, and ordinal day numbers.
datetime.time An instance of datetime.time has four attributes: hour, minute, second and microsecond.
Additionally, it can also carry an instance of tzinfo which describes the time zone for this time.
datetime.timedelta A datetime.timedelta is the duration between two dates, times or datetimes. It
has a value in days, seconds and microseconds. These can be added to or subtracted from dates, times
or datetimes to compute new dates, times or datetimes.
There are a number of interesting date calculation problems that we can solve with this module. Well look
at the following recipes:
Getting Days Between Two Dates
Getting Months Between Two Dates
Computing A Date From An Oset In Days
Computing A Date From An Oset In Months
Input of Dates and Times
Getting Days Between Two Dates. Because datetime.datetime objects have the numeric operators
dened, we can create datetime.datetime objects and subtract them to get the dierence in days and
seconds between the two times. The dierence between two date or datetime objects is a timedelta
object.
>>> import datetime
d>>> d1 = datetime.datetime.now()
>>> d2 = datetime.datetime.today()
>>> d2 - d1
datetime.timedelta(0, 14, 439322)
>>> d1
datetime.datetime(2009, 7, 9, 6, 44, 19, 45987)
>>> d2
datetime.datetime(2009, 7, 9, 6, 44, 33, 485309)
Surf http://stackoverflow.com for about a minute.
>>> d3 = datetime.datetime.now()
410
>>> d3 - d1
datetime.timedelta(0, 95, 848826)
The dierence between d2 and d1 was the object datetime.timedelta(0, 14, 439322), which means zero
days, 14 seconds and 439,322 microseconds.
The dierence between d3 and d1 was the object datetime.timedelta(0, 95, 848826), which means zero
days, 95 seconds and 848,826 microseconds.
If we said td= d3-d1, then td.days is the number of days between two dates or datetimes. td.seconds is
the number of seconds within the day, from 0 to 86400. The seconds attribute is zero if you get the dierence
between two dates, since they have to time information. td.seconds/60/60 is the number of hours between
the two datetimes.
If we do td.days/7, we compute the number of weeks between two dates. Getting Months Between
Two Dates. The number of months, quarters or years between two dates uses the instance variables of the
datetime.datetime object. If we have two variables, begin and end, we have to compute month numbers
from the dates. A month number includes the year and month information.
We compute a month number for a date as follows:
endMonth= end.year*12+end.month
startMonth = begin.year*12+begin.month
endMonth - beginMonth
The result is the months between the two dates. This correctly handles all issues with months in the same
or dierent years. Computing A Date From An Oset In Days. To computing a date in the future
using an oset in days, we can add a timedelta object to a datetime object. The timedelta object can be
constructed with a day oset or a seconds oset or both. In the following example, well compute the date
which is 5 days in the future.
now= datetime.datetime.now()
now + datetime.timedelta(days=5)
This will raise an ValueError if you try to create an invalid date like February 30th.
Note that this parallels our Computing A Date From An Oset In Months example. In both cases, we work
with a month number that combines month and year into a single serial number. Input of Dates and
Times. We get date and time information from three soures. We may ask the operating system what the
current date or time is. We may ask the person whos running our program for a date or a time. Most
commonly, we often process a le which has date or time information in it. For example, we may be reading
a le of stocks with dates on which a trade occurred.
14.3. Time and Date Processing : The time and datetime Modules
411
Getting a Date or Time From The OS. We get time from the OS when we want the current time, or
the timestamp associated with a system resource like a le or directory. The current time is created by
the datetime.datetime.now() object constructor. See Advanced File Exercises for examples of getting
le timestamps. When we get a le-related time, we get a oating-point number of seconds past the
epoch date. We can convert this to a proper datetime with datetime.datetime.fromtimestamp().
>>> import os
>>> import datetime
>>> mtime= os.path.getmtime( "Makefile" )
>>> datetime.datetime.fromtimestamp( mtime )
datetime.datetime(2009, 6, 9, 21, 10, 26)
Getting a Date or Time From A File. Files often have human-readable date and time information.
However, some les will have dates or times as strings of digits. For example, it might be 20070410 for
April 10, 2007. This is still a time parsing problem, and we can use datetime.datetime.strptime().
>>> someInput= "20070410" # stand-in for someFile.read()
>>> aDate = datetime.datetime.strptime( someInput, "%Y%m%d" )
>>> aDate
datetime.datetime(2007, 4, 10, 0, 0)
412
datetime.now() datetime.datetime
Current local date or datetime. If possible, supplies more precision than using a time.time() oatingpoint time. See utcnow().
datetime.utcnow() datetime.datetime
Current UTC date or datetime. If possible, supplies more precision than using a time.time() oatingpoint time. See now().
datetime.fromtimestamp(timestamp) datetime.datetime
Current local date or datetime from the given oating-point time, like those created by time.time().
datetime.utcfromtimestamp(timestamp) datetime.datetime
Current UTC date or datetime from the given oating-point time, like those created by time.time().
datetime.fromordinal(ordinal) datetime.datetime
Current local date or datetime from the given ordinal day number. The time elds of the datetime
will be zero.
datetime.fromordinal(date, time) datetime.datetime
Combine date elds from date with time elds from time to create a new datetime object.
The following methods return information about a given datetime object. In the following denitions, dt is
a datetime object.
The datetime.timedelta object holds a duration, measured in days, seconds and microseconds. There are
a number of ways of creating a datetime.timedelta. Once created, ordinary arithmetic operators like +
and - can be used between datetime.dateime and datetime.timedelta objects.
14.3. Time and Date Processing : The time and datetime Modules
413
Heres a step-by-step example for displaying the current time (time.time()) using the GNU/Linux standard
format for day and time. This shows a standardized and portable way to produce a time stamp.
>>> import time
>>> now= time.time()
>>> lt = time.localtime( now )
>>> time.strftime( "%x %X", lt )
'07/09/09 07:08:14'
>>> time.strftime( "%x %X", time.localtime( time.time() ) )
'07/09/09 07:08:47'
1. The time.time() function produces the current time in UTC (Coordinated Universal Time). Time is
represented as a oating-point number of seconds after an epoch. We save this in the variable now.
2. The time.localtime() function uses the operating systems local timezone information to convert
from a oating-point timestamp in UTC to time.struct_time object with the details of the current
local date and time. We save this in the variable lt.
3. The time.strftime() function formats a time.struct_time object. We use the formatting codes
that will do locale-specic time ("%x") and date ("%X") formatting. This allows the operating systems
localization features to specify the format for date and time, assuring that the users preferences are
honored.
There are a number of interesting date calculation recipes that apply to the time module.
Getting Days Between Two Dates
Getting Months Between Two Dates
Computing A Date From An Oset In Days
Computing A Date From An Oset In Months
Input of Dates and Times
These are some common recipes for date arithmetic.
Getting Days Between Two Dates. To get
the number of days between two dates, we calculate the dierence between two oating-point timestamp
representation of points in time. When we subtract these values we get seconds between two dates. Since
there are 86,400 seconds in a day, we can convert this number of seconds to a number of days, weeks, hours
or even minutes.
>>> import time
>>> now= time.time()
Surf http://stackoverflow.com for about 5 minutes.
>>> d2 = time.time()
>>> d2 - now
275.53438711166382
>>> (d2 - now) / 60
4.5922397851943968
The dierence is in seconds. When we divide by 60, thats the dierence in minutes. When we divide by
86400, thats the dierence in days. Getting Months Between Two Dates. To get the number of
months, quarters or years between two dates, we use the time.struct_time objects.
>>> start
>>> end =
>>> start
(2009, 7,
>>> end
(2009, 7,
414
= time.localtime( prior_time )
time.localtime( time.time() )
9, 7, 8, 14, 3, 190, 1)
9, 7, 17, 14, 3, 190, 1)
In this case, weve created start and end using time.localtime() conversions. We could also create the
time.struct_time objects from parsing user input.
Given two time.struct_time objects, start and end, we must compute month numbers that combine year
Computing A Date From An
and month into a single integer value that we an process correctly.
Oset In Days. To compute a date in the future using weeks or days, we can add an appropriate oset to
a oating-point timestamp value. Since the oating-point timestamp is in seconds, a number of days must
be multiplied by 86,400 to convert it to seconds. A week is 7 86400 = 604, 800 seconds long.
>>> next_week = 7*86400 + time.time()
>>> time.localtime( next_week )
(2009, 7, 16, 7, 23, 21, 3, 197, 1)
Computing A Date From An Oset In Months. To compute a date in the future using a number of
months or years, we have to create the time.struct_time object for the base date, and then update selected
elements of the tuple. Once weve updated the structure, we can then converting it back to a oating-point
timestamp value using time.mktime().
Note that we have to be careful to handle the year correctly. The easiest way to be sure this is done correctly
is to do the following:
1. Create a month number from the starting year and month, y*12+m.
2. Add a number of months (or 12 times the number of years) to this month number.
3. Extract the year and month from the resulting value by dividing by 12 (to get the new year) and using
the remainder as the new month.
>>> import time
>>> now= time.localtime( time.time() )
>>> thisYM= now.tm_year*12+now.tm_mon-1
>>> nextYM= thisYM+3
>>> dueYear, dueMonth =nextYM//12, nextYM%12+1
>>> nextSec= time.mktime( (dueYear,dueMonth,now.tm_mday,0,0,0,0,0,0) )
>>> time.localtime( nextSec )
(2009, 10, 9, 1, 0, 0, 4, 282, 1)
Input of Dates and Times. When we get a time value, its generally in one of two forms. Sometimes a
time value is represented as a number, other times its represented as a string.
Getting a Date or Time From The OS. We often get the OS time when we want the current
time, or the timestamp associated with a system resource like a le or directory. The current time is
available as the time.time() function.
See Advanced File Exercises for examples of getting le timestamps. When we get a le-related time,
the OS gives us a oating-point number of seconds past the epoch date. There are two kinds of
processing: simple display and time calculation.
To display an OS time, we need to convert the oating-point timestamp to a time.struct_time. We
use time.localtime() or time.gmtime() to make this conversion. Once we have a time.struct_time,
we use time.strftime() or time.asctime() to format and display the time.
>>> import os
>>> import time
>>> mtime= os.path.getmtime( "Makefile" )
>>> time.localtime( mtime )
(2009, 6, 9, 21, 10, 26, 1, 160, 1)
14.3. Time and Date Processing : The time and datetime Modules
415
Getting A Date or Time From A User. Human-readable time information generally has to be
parsed from a string. Human-readable time can include any of the endless variety of formats with some
combination of years, days, months, hours, minutes and seconds. In this case, we have to rst parse
the time, creating time.struct_time. The simplest parsing is done with time.strptime().
Heres an example of parsing an input string from a person. This will create time.struct_time called
theDate.
>>> import time
>>> someInput= "3/18/85" # stand-in for raw_input()
>>> theDate= time.strptime( someInput, "%m/%d/%y" )
>>> theDate
(1985, 3, 18, 0, 0, 0, 0, 77, -1)
Getting a Date or Time From A File. Files often have human-readable date and time information.
However, some les will have dates or times as strings of digits. For example, it might be 20070410 for
April 10, 2007. This is still a time parsing problem, and we can use time.strptime() to pick apart
the various elds. We can parse the 8-character string using
>>> import time
>>> someInput= "20070410" # stand-in for someFile.read()
>>> time.strptime( someInput, "%Y%m%d" )
(2007, 4, 10, 0, 0, 0, 1, 100, -1)
416
tm_yday Day of the year (1-366). This can be hard to gure out when youre creating a new
time. Fortunately, you can supply -1, and the time.mktime() function will determine the
day of the year correctly.
tm_isdst Daylight savings time ag: 0 is the regular time zone; 1 is the DST time zone. -1
is a value you can set when you create a time for the mktime(), indicating that mktime()
should determine DST based on the date and time.
Working With time_struct Objects. The time module includes the following functions that create a
time.struct_time object. The source timestamp can be a oating-point seconds-past-the-epoch value or
a formatted string.
time.gmtime(seconds) time.struct_time
Convert a timestamp with seconds since the epoch to a time.struct_time object expressing UTC
(a.k.a. GMT).
time.localtime(seconds) time.struct_time
Convert a timestamp with seconds since the epoch to a time.struct_time expressing local time.
time.strptime(string, format) time.struct_time
Parse the string using the given format to create a time.struct_time object expressing the given time
string. The format parameter uses the same directives as those used by time.strftime(); it defaults
to "%a %b %d %H:%M:%S %Y" which matches the formatting returned by the time.ctime() function.
If the input cant be parsed according to the format, a ValueError is raised.
someInput= "3/18/85" # stand-in for raw_input()
theDate= time.strptime( someInput, "%m/%d/%y" )
print(theDate)
When you want to create a proper time.time_struct object, youll nd that there are a few elds for which
you dont know the initial values. For example, its common to know year, month, day of month, hour,
minute and second. Its rare to know the day of the year or the day of the week. Consequently, you have to
do the following little two-step dance to create and initialize a time.struct_time.
In this example, well create a proper structure for 4/21/2007 at 2:51 PM. We can ll in six of the nine
values in a time.struct_time tuple. We just throw -1 in for the remaining values.
ts= time.localtime( time.mktime( (2007,4,21,14,51,00,-1,-1,-1) ) )
The value for ts, (2007, 4, 21, 14, 51, 0, 5, 111, 1), has the day of week (0 is Monday, 5 is Saturday)
and day of year (111) lled in correctly.
Working With Floating-Point Time. The time module includes the following functions that create a
oating-point seconds-past-the-epoch value. This value can be generated from the operating system, or
converted from a time.struct_time object.
Because a oating-point time value is a simple oating-point number, you can perform any mathematical
operations you want on that number. Since it is in seconds, you can divide by 86,400 to convert it to days.
time.time() oat
Return the current timestamp in seconds since the Epoch. Fractions of a second may be present if the
system clock provides them.
now= time.time()
time.mktime(struct_time) oat
Convert a time.struct_time object to seconds since the epoch. The weekday and day of the year
elds can be set to -1 on input, since they arent necessary. However, the DST eld is used.
14.3. Time and Date Processing : The time and datetime Modules
417
In this example, we convert a time.struct_time object into a list so we can update it. Then we can
make a oating-point time from the updated structure.
lt= time.localtime( time.time() )
nt= list( lt )
nt[1] += 3 # add three to months attribute
future= time.mktime( nt )
Working with String Time. The following functions of the time module create time as formatted string,
suitable for display or writing to a log le.
time.strftime(format, struct_time) string
Convert the time.struct_time object, structure to a string according to the format specication. A
format of %x %X produces a date and time.
lt= time.localtime( time.time() )
print(time.strftime( "%Y-%m-%dT%H:%M:%S", lt))
time.asctime(struct_time) string
Convert a time.struct_time to a string, e.g. Sat Jun 06 16:26:11 1998. This is the same as a the
format string "%a %b %d %H:%M:%S %Y".
time.ctime(seconds) string
Convert a oating-point time in seconds since the Epoch to a string in local time. This is equivalent
to time.asctime(time.localtime(seconds)).
If no time is given, use the current time. time.ctime()` ` does the same thing as ``time.asctime(
time.localtime( time.time() ) ).
Additional Functions and Variables. These are some additional functions and variables in the time
module.
This function appears in the time module. The sched module reects a better approach to time-dependent
processing.
time.sleep(seconds)
Delay execution for a given number of seconds. The argument may be a oating-point number for
subsecond precision. Operating system scheduling vagaries and interrupt handling make this function
imprecise.
time.clock() oat
Return the CPU time or real time since the start of the process or since the rst call to clock(). This
has as much precision as the system is capable of recording.
The following variables are part of the time module. They describe the current locale.
time.accept2dyear If non-zero, 2-digit years are accepted. 69-99 is treated as 1969 to 1999, 0
to 68 is treated as 2000 to 2068. This is 1 by default, unless the PYTHONY2K environment
variable is set; then this variable will be zero.
time.altzone Dierence in seconds between UTC and local Daylight Savings time. Often a
multiple of 3600 (all US time zones are in whole hours). For example, Eastern Daylight
Time is 14400 (4 hours).
time.daylight Non-zero if the locale uses daylight savings time. Zero if it does not. Your
operating system has ways to dene your locale.
time.timezone Dierence in seconds between UTC and local Standard time. Often a multiple
of 3600 (all US timezones are in whole hours). Your operating system has ways to dene
your locale.
time.tzname The name of the timezone.
418
Conversion Specications. When we looked at Strings, in Sequences of Characters : str and Unicode, we
looked at the % operator which formats a message using a template and specic values. The strftime() and
strptime() functions also use a number of conversion specications to convert between time.struct_time
and strings.
The following examples show a particular date (Satuday, August 4th) formatted with each of the formatting
strings.
Table 14.1: Overall Formatting
%c
%x
%X
%%
Sat
Saturday
Aug
August
04
216
08
30
6
31
01
2001
17
05
11
pm
20
14.3. Time and Date Processing : The time and datetime Modules
419
See Dening New Objects exercises. This calculation is a collaboration between each block of
ShareBlock and Position. As with the value calculations, a block-by-block calculation is added
to ShareBlock and a higher-level reduction algorithm is used in Position.
The annualization requires computing the duration of stock ownership. The essential feature here is
to parse the date string to create a time object and then get the number of days between two time
objects.
Given the sale date, purchase date, sale price, sp, and purchase price, pp.
Compute the period the asset was held. There are two choices:
Use time.mktime() to create oating-point time values for sale date, s, and purchase date, p.
The weighting, w, is computed as
w= (86400*365.2425) / ( s - p )
Use datetime.datetime() to create datetime.datetime objects for the sale date, s, and purchase
date, p. The weighting, w is computed as the following, which truncates the dierence to whole
days.
w= ( s - p ).days/365.2425
In this example, timeObj1 and timeObj2 are time structures with details parsed from the date string
by time.strptime(). The dayNumb1 and dayNumb2 are a day number that corresponds to this time.
Time is measured in seconds after an epoch; typically January 1, 1970. The exact value doesnt matter,
what matters is that the epoch is applied consistently by mktime(). We divide this by 24 hours per
day, 60 minutes per hour and 60 seconds per minute to get days after the epoch instead of seconds.
Given two day numbers, the dierence is the number of days between the two dates. In this case, there
are 151 days between the two dates.
If we held the stock for 151 days, that is .413 years. If the return for the 151 days was 3.25%, the
return for the whole year could have been 7.86%. In order to provide a rational basis for comparison,
we use this annualized ROI instead of ROIs over dierent durations.
All of this processing must be encapsulated into a method that computes the ownership duration.
time.ownedFor(saleDate) oat
This method computes the days the stock was owned.
time.annualizedROI(salePrice, saleDate) oat
We would need to add an annualizedROI() method to the ShareBlock that divides the gross
ROI by the duration in years to return the annualized ROI. Similarly, we would add a method
to the Position to use the annualizedROI() to compute the a weighted average which is the
annualized ROI for the entire position.
2. Date of Easter.
420
The following algorithm is based on one by Gauss, and published in Dershowitz and Reingold, Calendrical Calculations [Dershowitz97].
Date of Easter after 1582
14.3. Time and Date Processing : The time and datetime Modules
421
The time-as-seconds representation is a duration thats technically very simple. It turns out that a point
in time can be viewed as a duration measured against an epochal date. Everythings a oating-point
number. Not much can go wrong.
However, people like to see their calendar, not a number of seconds past an epochal date. So the
time.struct_time was added just to make it easy to display time values or accept time-oriented
inputs. Further, for business rules that involve months, the time.struct_time information is useful.
Both time-as-seconds, and time-as-structure are required. Some programs will use one representation
more than the other.
422
For example, a le may contain lines like "Birth Date: 3/8/87" or "Birth Date: 12/02/87". When
were reading lines like these from the le, we may do any of the following.
Matching to determine that the string has the right pattern of text. In this case, a matching pattern
might be "Birth Date:" followed by digits / digits / digits. A match pattern must be found at the
beginning of the target string.
Searching for the date pattern. A searching pattern could be digits / digits / digits. A search pattern
can be found anywhere within the string.
We may further parse the date string to extract groups of digits for month, day and year. A parsing
pattern can separate the various digit groups from the surrounding context.
We can accomplish these matching, searching and parsing operations with the re module in Python. A
regular expression (RE) is a rule or pattern used for matching, searching and parsing strings.
The Filename Wildcard. The fairly simple wild-card lename matching rules are kinds of regular
expressions, also. These rules are embodied in two packages that we looked at in Advanced File Exercises:
fnmatch and glob.
The lename regular expressions in fnmatch and glob use special characters that dont have their usual literal
meaning. When we write a glob pattern, characters simply match themselves. However, the * character
matches any sequence of characters in a le name. The ? character matches any single character in a le
name.
The re module provides considerably more sophisticated pattern matching capabilities than these simple
rules. It uses the same principle: some punctuation marks have special meanings as part of pattern specication.
File Searching. An example program which does this is called grep. This is a GNU/Linux application
program; the name means Global Regular Expression Print. (Windows users may be familiar with the
ndstr DOS command, which does approximately the same thing.)
The grep (or ndstr) program reads one or more les, searches for lines that match a given regular expression
and prints the matching lines.
Using Regular Expressions. The general recipe for using regular expressions in your program is the
following.
1. Be sure to include import re.
2. Dene the pattern string. We write patterns as string constants in our program.
3. Evaluate the re.compile() function to create a re.Pattern object. This re.compile() function is a
factory that creates usable pattern objects from our original pattern strings. The pattern object will
do the real work of matching a target string against your regular expression.
Usually we combine the pattern and the compile.
>>> date_pattern = re.compile( "Birth Date: +(.*)" )
4. Use the re.Pattern object to match or search the candidate strings. The result of a successful match
or search will be a re.Match object. In a sense, the Pattern object is a factory that creates Match
objects from string input.
>>> match = date_pattern.match( "Should Not Match" )
>>> match
>>> match = date_pattern.match( "Birth Date: 3/8/87" )
>>> match
<_sre.SRE_Match object at 0x82e60>
423
When a string doesnt match the pattern, the pattern object returns None (which is equivalent to
False.)
A successful match creates a re.Match object; and any object is equivalent to True. We can use the
match object in an if statement.
5. Use the information in the re.Match object to parse the string. The match object is only created for
a successful match, and provides the details to help us work with the original string.
>>> match.group()
'Birth Date: 3/8/87'
>>> match.group(1)
'3/8/87'
>>> match.groups()
('3/8/87',)
Pattern as Production Rule. One way to look at a regular expression is as a production rule for
constructing strings. You can think of the pattern as the rule producing a giant collection of all possible
strings.
When you use the pattern to matching a target, youre looking for your target string in that giant set of
possibilities.
As a practical matter, the Regular Expression module doesnt actually enumerate all of the strings that a
pattern describes. The set of possible strings could be innite.
Pragmatically, the match algorithm looks at each clause of your regular expression pattern and locates the
matching characters in the candidate string. If the next character in the string matches the next clause
in the regular expression rule, the algorithm goes forward. When the algorithm runs out of clauses in the
pattern, it has found a match.
In many cases, a clause in the pattern will have alternatives. In this case, the algorithm places bookmarks
in the target string and pattern at each alternative choice. If the next character in the target string doesnt
match the next clause in the pattern, then the algorithm backtracks and tries a dierent choice in the pattern.
In this way, the matching tries out the various alternatives in the pattern, looking for some way to match
the entire pattern against the string.
For example, a Regular Expression pattern could be "aba". This production rule describes a string created
from a, followed by b, followed by a. This simple rule only builds one possible string; consequently, only
candidate strings containing the exact sequence "aba" will be found by the patterns match() method.
A more complex RE pattern could be "ab*a". This production rule describes a string created from a,
followed by any number of b, followed by a. This describes an innite set of strings including "aa", "aba",
"abba", etc. Note that the phrase any number of includes zero. Thats why "aa" matches: it has zero b.
Note that the * character means repeat the previous RE. This is dierent from the way fnmatch works.
Well explore the special characters in the re module in detail in the next section.
424
Some characters, however, have special meanings. Mostly these are punctuation marks; they dont match
a character, but they are a pattern or a modication to a previous pattern. For example, a . in a pattern
doesnt match the period character, it matches any single character.
But what if we want to match a .? We must escape that special meaning by using a \ in front of the
character. For example, \. escapes the special meaning that . normally has; it creates a single-character
RE that matches only the character ..
Additionally, some ordinary characters can be made special with the escape character, \. For instance \d
does mot match d, it matches any digit; \s does not match s it matches any whitespace character.
Any ordinary character, by itself, is a RE. Example: "a" is a RE that matches the character a in the
candidate string. While trivial, it is critical to know that each ordinary character is a stand-alone RE. A
special character is an RE when it is escaped with \. For example, . and * are special characters, but \.
and \* are simple one-character REs.
The special character . is a RE that matches any single character. Example: "x.z" is a RE that
matches the strings like "xaz" or "x9z", but doesnt match strings like "xabz" or "xz".
The special characters [] create a RE that matches any one of the characters in a set dened
by the characters in the []. Example: "x[abc]z" matches any of "xaz", "xbz" or "xcz".
A range of characters can be specied using a -. The character before and after the - must be
in proper order. For example "x[1-9]z".
Multiple ranges are allowed, for example "x[A-Za-z]z".
Heres a common RE that matches a letter followed by a letter,
"[A-Za-z][A-Za-z0-9_]".
digit or _:
To include a -, it must be the rst or last character in the []s. If - is not rst or last, then it
indicates a range or characters.
A ^ must not be the rst character in the []s. If ^ is rst, it modies the meaning of the []s.
Some common sets of characters have shorter names. [0-9] can be abbreviated \d (d for digit).
[ \t\n\r\f\v] can be abbreviated \s (s for space). [a-zA-Z0-9_] can be abbreviated \w (w
for word).
The special character ^ modies the brackets [^...]. This creates an RE that matches any character
except those between the []s. Example: "a[^xyz]b" matches strings like "a9b" and "a$b", but dont match
"axb". As with [], a range can be specied and multiple ranges can be specied.
To include a -, it must be the rst or last character in the []s.
Some common sets of characters have shorter names. \D (D for non-digits) is the same as [^0-9],
the opposite of \d. \S (S for non-space) is the same as [^ \t\n\r\f\v], the opposite of \s. \W
(W for non-word) is the same as [^a-zA-Z0-9_], the opposite of \w.
An RE can be formed from concatenating REs. Example: "a.b" is three regular expressions, the
rst matches a, the second matches any character, the third matches b. While this may seem obvious, its
a necessary rule that helps us gure out which REs are modied by the * or | operators.
This is perhaps the most important rule for dening regular expressions. This rule tells us that
we can put any number of one-part regular expressions together in a sequence to make a new,
longer RE.
Note that theres no special character that puts REs together; the sequence of REs is implied.
This is similar to the way mathematicians imply multiplication by writing symbols next to each
other. For example, 2r means 2 r.
An RE can be a group of REs with (). This creates a single RE that is composed of multiple parts.
Also, this denes parts of the string that will be captured by the Match object.
14.4. Text Processing and Pattern Matching : The re Module
425
Example: "(ab)c" is a regular expression composed of two regular expressions: "(ab)" (which,
in turn, is composed of two REs) and "c". This matches the string "abc". This grouping is
used with the repetition operators (*, +, ?) shown below, and the alternative operator, |.
() also identify REs for parsing purposes. The elements matched within () are remembered by
the regular expression processor and set aside in the resulting Match object. By saving matched
characters, we can decompose a string into useful groups.
An RE can be repeated using * , + or ? Several repeat constructs are available: "x*" repeats x zero or
more times; "y+" repeats y 1 or more times; "z?" repeats z zero or once, it makes the previous RE optional.
Example: "1(abc)*2" matches "12" (zero copies of abc) or "1abc2" or "1abcabc2", etc. Since
the (abc) part of this pattern uses ()s, the sequence of expressions is repeated as a whole. The
rst match, against "12", is often surprising; but there are zero copies of abc between 1 and 2.
Example: "1[abc]*2" matches "12" or "1a2" or "1b2" or "1abacab2", etc. Since the [abc]
part of this pattern uses []s, any one of the characters in the [] will match. The rst match,
against "12", is often surprising; but there are zero instances of any of the abc character set
between 1 and 2.
Two REs are alternatives, using |. The alternative construct allows you to combine a number of
dierent rules into a single pattern. For example, you might have two allowed forms for dates: mm/dd/yyyy
or dd-mon-yyyy. You might write the following pattern: r"(\d+/\d+/\d+)|(\d+-\w+-\d+)" to match either
alternative.
The character ^ is an RE that only matches the beginning of the line.
The chacaters $ is an RE that only matches the end of the line.
Example: "^$" matches a completely empty line.
4. We apply our pattern against the candidate string "9/10/56". This creates a Match object, which
means that the string matched the pattern. When we evaluate the groups() method of a Match
object, we get a tuple of the () groups in the pattern. The rst set of ()s matched 9/10/56. The
second set of ()s didnt match anything.
5. We apply our pattern against the candidate string "10-sep-56". This creates a Match object, which
means that the string matched the pattern. When we evaluate the groups() method of a Match object,
we get a tuple of the () groups in the pattern. The rst set of ()s didnt match anything. The second
set of ()s matched 10-sep-56.
6. We apply our pattern against the candidate string "hi mom". The response is None, which isnt shown.
Because this expression did not create a Match object, it means that the string did not match the
pattern.
Match A Property File Line. This pattern matches the kind of line that is often found in a properties
le or a conguration le.
"\s*(\w+)\s*[:=]\s*(.*)"
This pattern matches a one or more digits with (\d+), a :, one or more digits, a :, and digits followed by
optional . and zero or more other digits. For example "20:07:13.2" would match, as would "13:04:05"
Further, the ()s would allow separating the digit strings for conversion and further processing. Again, the
punctuation marks are quietly dropped, since we only want to process the numbers.
A Python Identier. This is a pattern which denes a Python identier.
"[_A-Za-z][_A-Za-z1-9]*"
This embodies the rule of starting with a letter or _, and containing letters, digits or _.
"^\s*import\s"
The pattern above matches a Python import statement. It matches the beginning of the line with ^; it
matches zero or more whitespace characters with \s*; it matches the sequence of letters import; it matches
one more whitespace character, and ignores the rest of the line.
427
This object can then be used to match or search candidate strings. A successful match returns a Match
object with details of the matching substring.
Hes the formal denition for the re.compile() function of the re package. This translates an RE string
into a Pattern object that can be used to search a string or match a string.
re.compile(string) Pattern
Create a Pattern object from an RE string. The object that results is for use in searching or matching;
it has several methods, including match() and search().
The following example shows the pattern r"(dd):(dd)" which should match strings which have two
digits, a :, and two digits. Well match the candidate string "23:59", which produces a Match object.
When we try to match the string "hi mom", we get result of None.
>>> import re
>>> hhmm_pat= re.compile( r"(\d\d):(\d\d)" )
>>> hhmm_pat.match( "23:59" )
<_sre.SRE_Match object at 0x68d58>
>>> _.groups()
('23', '59')
>>> hhmm_pat.match( "hi mom" )
There are some other options available for re.compile(), see the Python Library Reference, [PythonLib]
section 4.2, for more information.
The raw string notation (r"pattern") is generally used to simplify the \s required. Without the raw
notation, each \ in the string would have to be escaped by a \, making it \\. This rapidly gets cumbersome.
Important: Confusing Class Names
As you work though the various examples, youll see that the type() claims the object class names are
SRE_Pattern and SRE_Match. Weve fudged the class names in the book to make the explanation simpler.
Also, in the future, there may be other, alternative RE packages, and the class names may be slightly
dierent.
When we say import re, clearly something in the re module is then importing and using a module name
_sre.
We dont need to know much more than this. Thats why the names dont precisely match what we think
they should say based on other, simpler, Python modules.
The following methods are part of a compiled Pattern. Assume that we assigned the pattern to the variable
pattern, via a statement like pat = re.compile....
class re.Pattern
pattern.match(string) Match
Match a candidate string against the compiled regular expression, pat. Matching means that the regular
expression and the candidate string must match, starting at the beginning of the candidate string. A
Match object is returned if there is match, otherwise None is returned.
pattern.search(string) Match
Search a candidate string for the compiled regular expression, pat.. Search means that the regular
expression must be found somewhere in the candidate string. A Match object is returned if the pattern
is found, otherwise None is returned.
If search() or match() nd the pattern in the candidate string, a Match object is created to describe the
match. The following methods are part of a Match object; well use the variable name match.
428
Tip: Debugging Regular Expressions If you forget to import the module, then you get NameError on every
class, function or variable reference.
If you spell the name wrong on your import statement, or the module isnt on your Python Path, youll
get an ImportError. First, be sure youve spelled the module name correctly. If you import sys and then
look at sys.path, you can see all the places Python look for the module. You can look in each of those
directories to see that the les are named.
There are two large problems that can cause problems with regular expressions: getting the regular expression
wrong and getting the processing wrong.
The regular expression language, with its special characters, escapes, and heavy use of \ is rather dicult
to learn. If you get error exceptions from re.compile(), then your RE pattern is improper. For example
error: multiple repeat means that your RE is misusing "*" characters. There are a number of these
errors which indicate that you are likely missing a \ to escape the special meaning of one or more characters
in your pattern.
If you get TypeError errors from match() or search(), then you have not used a candidate string with your
pattern. Once youve compiled a pattern with pat= re.compile("some pattern"), you use that pattern
object with candidate strings: matching= pat.match("candidate"). If you try pat.match(23), 23 isnt
a string and you get a TypeError.
Beyond these very visible problems are the more subtle problem with a pattern that doesnt match what
you think it should match. Well look at this separately, in More Debugging Hints.
If your parsing isnt working, then a test script like the following helps to debug the patterns so you can see
what is matching and being parsed and what is being ignored.
import re
pat= re.compile( r"(\d+):(\d+)" )
m= pat.match( "23:59" )
429
In this last example, youll note that our pattern matched digits, but our test data included a .. Either our
test is wrong, or our pattern is wrong. This is the art of debugging: what was really supposed to happen?
Did it happen?
In this case, well have to rewrite the pattern to get the test to pass.
Unit Test Framework. This way of testing our patterns is so important, we sometimes create separate
modules just for proving that our patterns work. The example shown above with assert statements is just
the tip of the iceberg.
The Python unittest module provides a way to create special test modules that exist simply to prove that
our software really works intended.
This is beyond the scope of this book, so well stick with simple scripts that use the assert statement.
12:46:50,109
12:46:50,109
12:46:50,125
12:57:14,046
12:57:18,875
12:57:19,625
INFO
INFO
INFO
INFO
INFO
INFO
This sequence decodes a complex input value into individual elds and then computes a single result.
>>> import re
>>> datePat= re.compile("(\d\d\d\d)-(\d\d)-(\d\d)")
>>> logLine = "2003-07-28 12:46:50,109 INFO [main] [] - Export directory does not exist"
>>> dateMatch= datePat.match( logLine )
>>> dateMatch.group( 0, 1, 2, 3 )
('2003-07-28', '2003', '07', '28')
>>> y,m,d= map( int, dateMatch.group(1,2,3) )
>>> import datetime
>>> lineDate= datetime.date( y, m, d )
>>> lineDate
datetime.date(2003, 7, 28)
430
431
432
CHAPTER
FIFTEEN
433
If you look at Python application programs, youll see that the name of the application almost always matches
one of the le names. For example, the IDLE application is launched by a le named idle.py. This le
contains the main part of the application. IDLE has numerous other les, which contain class and function
denitions.
Program Varieties. There are several subspecies of programs. We touched on this concept in Files are
the Plumbing of a Software Architecture.
In this book, weve focused exclusively on command-line interface (CLI) programs because they are simpler
to create. A richly interactive Graphic User Interface (GUI) program is generally more complex to build.
Further, the core functionality for a GUI is often easiest to develop and debug as a CLI program. Once you
have the CLI program working, you can wrap it up with a GUI.
To some programmers it seems more logical to design the user experience of a GUI rst, and get the windows,
menus, and buttons to work rst. After all,, they argue, the users interaction is the most important part
of the software. As a practical matter, however, this doesnt work out well. It turns out to be far better to
get the essential data and processing dened and working rst. Once this works reliably and correctly, its
easy to add a GUI to an already working program.
What this usually means is that we have the following structure.
One more more modules that denes the essential work of the program. This is a model of the real
world dened with Python objects.
We often write a command-line application script that imports the model.
We can also write a GUI application script that imports the model. This includes the graphical view
and the control logic.
This clean separation between the modules that do the work and the modules that provide the user experience
makes our life simpler in the long run because each side of the application can be focused on a particular
part of the task.
Well return these varieties of main programs in Architectural Patterns A Family Tree.
Evolution. Programs are built up from modules. In some cases, a program evolves as a series of modules.
First, we start with something really basic. Then we write a module that imports our rst module, and
implements better input and output. Then we gure out how the optparse module works and we write a
module which imports the second and adds a better CLI. Then we write a GUI in GTK, which imports all
of our previous modules. At each step, we are building additional features around the original small core of
data or processing.
Sometimes, we create a program using someone elses complete program. We might expand on someone
elses program or we might be knitting two programs together to make something new.
In all of these cases, we will have modules which can be used as main programs, but are also absorbed into
a larger and more complex program. Python gives us a very elegant mechanism for turning a main program
into a module that can be imported into a larger program.
The __name__ variable. The global __name__ variable is the name of the currently executing module. It
helps us determine if a module is the main module the module being run by Python or a library module
being imported.
When the __name__ variable is equal to '__main__' this is the initial (or top-level or outermost) le is
being processed. When a module is being imported, the __name__ variable is the name of the module being
imported.
If a module is the main program, it must do the useful work. If it is a being imported, on the other hand,
it is merely providing denitions to some other main program, and should do no work except provide class
and function denitions.
434
You can type the following at the command line prompt in IDLE. If you want to experiment, create a le
with just one line: print(__name__) and import this to see what it does.
>>> __name__
'__main__'
This __name__ variable allows a module to be used as both a main program and as a library for another
program. This can be called the main-import switch, as it helps a module determine if it is the main
program or it is an import into another main program. It gives us the ultimate exibility to expand, rene
and reuse our modules for a variety of purposes.
A main program script generally looks like the following.
#!/usr/bin/env python
"""Module docstring"""
import someModule
def main():
*the real work*
if __name__ == "__main__":
main()
435
The operands are not decorated with punctuation; usually they are le names, but could be permissions or
user names.
For example, we might do an ls -s /usr, which provides an option of -s and an argument of /usr. (For
Windows, an example is dir /o:s C:\Documents and Settings, which has an option of /o:s and an
argument of "C:\Documents and Settings".)
When the program runs, we see two kinds of output, usually intermixed into one stream. We see the output
plus any error messages. We can use some redirection operators like > to capture the output and send it to
a le. We can use 2> to capture the errors and send them to a le.
This redirection is beyond the scope of this book, but is covered in all of the books on GNU/Linux programming.
Command-Line Interface (CLI) programs. There are two critical features that make a CLI program
well-behaved. First, the program should accept parameters (options and arguments) in a standard manner.
Second, the program should generally limit output to the standard output and standard error les created
by the operating system. When any other les are written it must be by user request and possibly require
interactive conrmation. Its bad behavior to silently overwrite a le.
The standard handling of command-line parameters is given as 13 rules for UNIX commands, as shown in
the intro section of UNIX man pages. These rules describe the program name (rules 1-2), simple options
(rules 3-5), options that take argument values (rules 6-8) and operands (rules 9 and 10) for the program.
1. The program name should be between two and nine characters. This is consistent with most le systems
where the program name is a le name. In the Python environment, the program le is typically the
program name plus an extension of .py. Example: python, idle.py.
2. The program name should include only lower-case letters and digits. The objective is to keep names
relatively simple and easy to type correctly. Mixed-case names and names with punctuation marks can
introduce diculties in typing the program name correctly.
3. Option names should be one character long. This is dicult to achieve in complex programs. Often,
options have two forms: a single-character short form and a multi-character long form. Example: ls
-a, rm -i *.pyc.
4. Single-character options are preceded by -. Multiple-character options are preceded by --. All options
have a ag that indicates that this is an option, not an operand. Single character options, again, are
easier to type, but may be hard to remember for new users of a program.
5. Options with no arguments may be grouped after a single -. This allows a series of one-character
options to be given in a simple cluster. Example ls -ldai clusters the -l, -d, -a and -i options.
6. Options that accept an argument value use a space separator. The option arguments are not run
together with the option. Without this rule, it might be dicult to tell a option cluster from an option
with arguments. Example: cut -ds is an argument value of s for the -d option.
7. The argument value to an option cannot be optional. If an option requires an argument value, presence
of the option means that an argument value will follow. The option is already optional; having an
optional argument doesnt make much sense.
8. Groups of option-arguments following an option must be a single word; either separated by commas
or quoted. A space would mean another option or the beginning of the operands. Example: -d
"9,10,56": three numbers separated by commas form the argument value for the -d option.
9. All options must precede any operands on the command line. This basic principle assures a simple,
easy to understand uniformity to command processing.
10. The string -- may be used to indicate the end of the options. This is particularly important when any
of the operands begin with - and might be mistaken for an option.
436
11. The order of the options relative to one another should not matter. Generally, a program should absorb
all of the options to set up the processing.
12. The relative order of the operands may be signicant. This depends on what the operands mean
and what the program does. The operands are often le names, and the order in which the les are
processed may be signicant. Example: ls -l -a is the same as ls -a -l and ls -la.
13. The operand - preceded and followed by a space character should only be used to mean standard input.
This may be passed as an operand, to indicate that the standard input le is processed at this time.
Example, cat le1 - le2 will process file1, standard input and file2 in that order.
Parsing Command-Line Options. These rules are handled by the getopt module, the optparse module
and the sys.argv variable in the sys module.
Important: But Wait! This is ne GNU/Linux, but what about Windows?
Windows programmers have several choices. The most common solution is to use the UNIX rules. They are
compatible with Windows, simple and most important standardized by POSIX. This means that your
program will use the - character for options, where the Microsoft-supplied programs will use /. How often
do you use the Microsoft-supplied programs?
Another choice is to extend the getopt or optparse modules to handle Windows punctuation rules. This
would allow you to seamlessly t with the Microsoft command-line programs.
And, of course, you can always write your own option parser that looks for arguments which begin with /.
The command line arguments used to start Python are put into the sys.argv variable of the sys module as
a sequence of strings.
For example, when we run something like
python casinosim.py -g craps
The operating system (Linux or Windows) sees the python command and runs the Python interpreter,
passing the remaining arguments to the Python interpreter as a list of strings: ["casinosim.py", "-g",
"craps"].
The rst operand to the Python interpreter is always the top-level script to run. Python sets __name__ to
"__main__" and executes the le, casinosim.py. The other argument values are placed into sys.argv.
Overview of optparse. First, of course, we have to think about our main program and how we want to use
it. Once weve gured out the arguments and options, we can then use optparse to transform the arguments
in sys.argv into options and arguments our program can use.
The optparse module parses the command-line options in a three-step process.
1. Create an empty parser.
2. Dene the options that this parser will handle.
3. Parse the arguments. This gives you a tuple with two objects. One object has the options as attributes.
The other object is a list of the arguments that followed the options.
Once we have the options and arguments, we can then do the real work of our program.
Parameter Parsing. Lets say we polished up some of our exercises to create a complete program with the
following synopsis. -v-h-d mm/dd/yy-s symbolfile
portfolio.py
437
-v
Verbosity. This can be repeated to increase the detail of the logging.
-h
Help. Provides a summary of portfolio.py.
-d mm/dd/yy
A particular sale date at which to evaluate the portfolio.
-s symbol
A particular symbol to select from the portfolio.
file
The name of a le with the portfolio data in CSV format.
These options can be processed as follows:
import optparse
parser= optparse.parser()
# -h automtically added by default
parser.add_option( "-v", action="count", dest="verbosity" )
parser.add_option( "-d", action="store", dest="date" )
parser.add_option( "-s", action="store", dest="symbol" )
options, filenames = parser.parse()
#
#
#
#
dest="flag",
The -v option leads to verbose output, where every individual toss of the dice is shown. Without the -v
option, only the summary statistics are shown. The -s option tells how many samples to create. If this is
omitted, 100 samples are used.
Here is the entire le. This program has a ve-part design pattern that weve grouped into three sections.
dicesim.py
1
2
#!/usr/bin/env python
"""dicesim.py
3
4
5
6
7
8
Synopsis:
dicesim.py [-v] [-s samples]
-v is for verbose output (show each sample)
-s is the number of samples (default 100)
"""
9
10
11
12
13
14
15
16
17
18
19
20
21
22
439
23
24
25
26
27
28
29
def main():
parser= optparse.parser()
parser.add_option( "-v", "--verbose", action="count", dest="verbosity" )
parser.add_option( "-s", "--samples", action="store", type="int", dest="samples" )
parser.set_defaults( verbosity=0, samples=100 )
options, args = parser.parse()
dicesim( options.samples, options.verbosity )
30
31
32
if __name__ == "__main__":
main()
2. Docstring. The docstring provides the synopsis of the program, plus any other relevant documentation. This should be reasonably complete. Each element of the documentation is separated by blank
lines. Several standard document extract utilities expect this kind of formatting.
10. Imports. The imports line lists the other modules on which this program depends. Each of these
modules might have the main-import switch and a separate main program. Our objective is to reuse
the imported classes and functions, not the main function.
14. Actual Processing. This is the actual heart of the program. It is a pure function with no dependencies
on a particular operating system. It can be imported by some other program and reused.
23. Argument Decoding in Main. This is the interface between the operating system that initiates
this program and the actual work in dicesym. This does not have much reuse potential.
31. Main Import Switch. This makes the determination if this is a main program or an import. If
it is an import, then __name__ is not "__main__", and no additional processing happens beyond the
denitions. If it is the main program, then __name__ is "__main__"; the arguments are parsed by the
function main(), which calls dicesym() to do the real work.
This is a typical layout for a complete Python main program. We strive for two objectives. First, keep the
main() program focused; second, provide as many opportunities for reuse as possible.
Further, Python also optimizes the modules brought in by the import statement so that they are only
imported once.
The exec statement is similar to import, except it does not create a module object. Consequently, it doesnt
do any optimization to execute a module le just once.
The exec statement executes a suite of Python statements.
exec
expression
The expression can be an open le (created with the open() function), a string value which contains Python
language statements, as well as a code object created by the compile() function.
Additionally, this form of the exec statement executes in a given namespace.
exec
expression in namespace
The namespace is a dictionary what will be used for any global variables created by the statements executed.
>>> code="""a= 3
... b= 5
... c= a*b
... """
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> exec code in results
>>> results['a']
3
>>> results['c']
15
441
Command-Line Interface (CLI) Programs. These programs are run from the command-line
prompt or put into shell scripts. Weve looked at these in detail, since they form the basis for all
other kinds of programs.
Graphic User Interface (GUI) Programs. These programs are generally started by doubleclicking an icon. These programs are interactive, allowing a user to create and manipulate data
objects. All games are GUI programs; our oce suite, including word processors, spread sheets,
graphics programs, schedule managers and contact managers are interactive GUI programs.
Almost universally, a good GUI program is a graphical veneer wrapped around a core of essential
processing. That core often has a CLI as well as a GUI. For this reason, we focus on CLI
programming.
Emebedded Control Programs. This is the software the controls a device or system like a
dishwasher, microwave oven, heat pump, robot or radar system. This is beyond the scope of this
book. Its also not the best application for Python.
Programs That Share Resources Though The Internet. We also call these client-server programs: a client application communications with a server application using the Internetworking protocols. Sometimes, additional middleware is used to facilitate cooperation between client and server
programs.
Web Applications. Web applications are one species of client-server programs. They use the
HTTP (Hypertext Transfer Protocol). In this case a browser is the client of a web server.
File Transfers. The File Transfer Protocol (FTP) can be used to copy les from machine to
machine on the internet. In this case an FTP client connects with an FTP server.
Email. The SMTP, POP and IMAP protocols can be used for various parts of email processing.
SMTP is the Simple Mail Transfer Protocol and handles routing of email. POP (Post-Oce
Protocol) and IMAP (Internet Message Access Protocol) are ways to handle individual mail
messages.
Database. Many database applications are client-server architectures. A client application will
access a database server. There are some standard protocols for this (ODBC and JDBC). There
are many more non-standard protocols.
Python applications try to make the non-standard database protocols at least conform to a standard interface specication called DB-API.
There are many, many Internetworking Protocols that form the basis for client-server programming.
These include DHCP (Dynamic Host Conguration Protocol), DNS (the Domain Name System), NTP
(Network Time Protocol), SSH (Secure Shell), SNMP (Simple Network Management Protocol. All of
these protocols dene a client and server relationship.
442
Compiler. A compiler usually performs extremely complex transformations from one or more input
les to create an output le. Often, the input is a language of some sort, similar to the Python
language.
Interpreter. In an interpreter, statements in a language are read and processed. Some Unix utilities
(like awk) combine ltering and interpreting. A database server (like MySQL or Oracle) is actually a
kind of interpreter: it accepts statements in the SQL language, and uses those statements to create,
modify or extract information from a database.
A tremendous amount of data processing can be accomplished with these basic avors of programs. When
we have complex problems, we can often use these patterns to decompose the problem into smaller problems,
each of which is easier to solve in isolation. Then we can knit these smaller solutions together to tackle our
real data processing problem.
We often write CLI programs that use Internet-based resources. Just as a hint, youll want to make use of
urllib or urllib2. These modules allow your CLI program to read an Internet resource as if it were a local
le. With these modules, you dont need a browser or other complex graphical program to do useful work
on the Internet.
443
444
An application could, for example, use urllib2 to make a request of a web server.
3. The server will receive and process the request. It will respond in some way to the client.
An HTTP server, for example, might receive a GET request for a particular URL. It would locate the
relevant le and send the requested page back to the client.
4. The client will take action based on the servers response.
In the case of a web browser, it receives the HTML page, makes requests for any of the additional
images mentioned, and then renders the page by drawing it in the browser window.
445
appropriate libraries and frameworks, and determine what you need to do to apply all that technology to
your problem. We cant really look at this in too much depth becuase there are simply so many choices.
We can, however, talk about the initial steps of transforming an idea into software. Some folks call this
turning the corner from analyzing the problem to creating a solution. Things work out well when this is
a gradual shift in focus and not an abrupt change in technology and terminology.
Well look at this in Elaboration Overcoming Obstacles.
Construction. The bulk of this book was about construction of software. There are some more things we
can and will say about construction. There are two specic areas that are important aspects of the work,
but arent specically related to Python or writing programs in the Python language.
Quality Assurance Does it Work?
Conguration Management Pieces and Parts
Transition. The nal step in software development is the transition from the developers hands to the users
hands. The reason we raised the curtain on this four-act play was to create software that someone can use
to be happier and more productive.
Software that youve built for your own use can still benet from a formal move into a nished area. Its
good to close the curtain on development and call a project complete.
Well look at some ways of doing this in Transition Installing the Final Product.
During the opening acts of software development, we merely want to identify the use cases in some general
way. We want to give them titles, perhaps dene which actors will engage in them. We might want to state
the goals for each use case.
As we move into Elaboration, well provide more extensive information.
447
Often, we start with class denitions. To write our class denitions, it sometimes helps to rough out a design
as a simple le of denitions with document strings. It helps to put a crisp responsibility statement and a
list of collaborators in the docstring to help focus on what a class does.
class Bin( object ):
"""A bin on a roulette wheel.
Responsibilities:
-
Number of bin
Color
Even/Odd
High/Low
Red/Black
Column number
etc.
Collaborates with:
- Wheel
"""
pass
class Wheel( list ):
"""A collection of 38 Bins.
Responsibilities:
- Holds all the bins
- Picks a bin at random
Collaborators:
- Bin
- Some simulation
"""
pass
class Simulation( object ):
"""TODO: don't know quite what this does...
Reponsibilities:
Collaborators:
- Wheel
"""
pass
We can use a le like this for thinking out loud about our design. What classes do we think well need?
What will they do? What real-world thing do they parallel?
448
Simpler Is Better. First, we have to paraphrase Albert Einstein, and suggest that software must be made
as simple as possible, but no simpler.
We have to add E. W. Dijkstras observation that simplicity and elegance are unpopular because they
require hard work and discipline to achieve.
To this, Ill add If its really hard, youre doing it wrong. Most Python libraries are simple and solve some
prople really well. If you pick the wrong library, you may have to really struggle to get it to do what you
want.
If you nd that youre really struggling, it may mean that youve picked the wrong tool for the job. You
might want to stop, take a step back, and look around for alternative tools, or an alternative approach.
Building the Right Thing. Quality Assurance starts during inception when we ask ourselves if were
really solving the right problem for the right people. This fundamental question What problem does this
solve? must be carried through every step of building software.
It helps to expand this question slightly:
What is the problem?
Who has this problem?
Why do they have this problem?
When and Where do they have this problem?
Often, this fundamental question is ignored. New technology is an attractive nuisance, and too many
programmers are seduced by technology that isnt really helping them solve any problems.
This aspect of quality assurance requires some reection and consideration. Sometimes it requires wisdom
or insight. Other times it requires someone to ask the dumb question of why are we building this? or
why are we building it this way?
Build the Thing Right. Quality Assurance contains a more technical consideration, also. This comes
into play during Construction and Transition. This fundamental question Does this actually solve the
problem? must also be asked.
We often expand on this slightly:
Are we using using Python (and the various libraries) the right way?
Does our program actually work?
This question should always be followed by What evidence do you have? Which leads us to the nal
question.
Does it produce correct answers for test cases?
One technique for determining if were building something the right way is to write tests for all of the
packages, modules, classes and functions we created. Besides providing evidence of correct behavior, writing
tests is a very popular way to gain experience with the Python language, the libraries, and the modules and
classes were trying to design.
This sense of quality assurance requires technical tools. Well look at two Python modules that help with
using tests as part of quality assurance.
doctest. This module examines the docstrings for classes and functions to locate test scenarios.
unittest. This module executes unit test scripts. A unit test script is a special-purpose module
designed to test other modules.
A Fixture That Needs Testing. Lets look at a really simple module that denes a single function. Well
show examples of how to write tests for this module.
449
Heres our initial version of this module, without giving miuch thought to how we would test this module
for correct behavior.
wheel.py
#!/usr/bin/env python
"""A really simple module."""
def even( spin ):
"""even( spin ) -> true if the spin is even and not zero.
"""
return spin % 2 == 0
Note carefully that our docstring for the even() function doesnt match the actual denition. We have a
bug, and well show how testing reveals this bug.
Doctest Example. To use doctest, we need to write docstrings that includes actual examples of using the
function. We make those examples look like they were copied and pasted from Python in interactive mode.
Lets put this function in a module called wheel.py.
wheel.py
#!/usr/bin/env python
"""A really simple module.
>>> import wheel
>>> wheel.even( 2 )
True
"""
def even( spin ):
"""even( spin ) -> true if the spin is even and not zero."
>>> even( 1
False
>>> even( 2
True
>>> even( 0
False
"""
return spin
)
)
)
% 2 == 0
1. In the module docstring, we show how the module as a whole should be used. We created an interactive
Python log showing how we expect this module to behave in general.
2. In the even() docstring, we also show how the function should be used. In this case, we wrote the log
we expected to see. This docstring shows what would happen if the function was written correctly.
Heres a separate script that will examine the docstrings, locate the test sequences, execute them, and
compare actual results against the expected results in the docstrings.
import wheel
import doctest
doctest.testmod(wheel)
2. We import doctest.
3. We evaulate the doctest.testmod() function against the given module.
Heres the output from running this.
MacBook-5:notes slott$ python test1_doctest.py
**********************************************************************
File "/Users/slott/Documents/demo/roulette/wheel.py", line 16, in wheel.even
Failed example:
even( 0 )
Expected:
False
Got:
True
**********************************************************************
1 items had failures:
1 of
3 in wheel.even
***Test Failed*** 1 failures.
Interestingly, our even() function has a bug. In Roulette, the numbers 0 and 00 are neither even nor odd.
Our even() function doesnt handle 0 at all.
Python 3
Starting with Python 2.6, we dont need the little 3-line test driver module to run doctest. We will be
able to run the entire test suite from the command line.
Unittest Example. To use unittest, we need to write a module that includes some tests cases and a main
script. The test cases are formalized as subclasses of unittest.TestCase. Each of these class denitions
embodies a series of test methods applied to a test xture.
1
2
#!/usr/bin/env python
import unittest
3
4
import wheel
5
6
7
8
9
10
11
12
13
14
15
if __name__ == "__main__":
unittest.main()
2. We import unittest.
4. We import the module were going to test.
6. We dene a subclass of unittest.TestCase.
Each method function that begin with test... is a dierent test of our xture in this case,
the function wheel.even().
Each method function is based on an assert or fail method function of unittest.TestCase.
There are dozens of these method functions to help us specify the behavior of our xture. In this
case, we used the assertTrue() and assertFalse() functions.
15.3. Professionalism : Additional Tips and Hints
451
15. We use the main program switch to execute unittest.main() only when this module is the main
module.
The unittest.main() function will locate all of the subclasses of TestCase. It locate all method
functions with names that start with test. It will then create the object, execute the methods, and
count the number of tests that pass, fail or have errors.
Heres the output.
MacBook-5:notes slott$ python test1_unittest.py
F..
======================================================================
FAIL: test_0 (__main__.TestEven)
---------------------------------------------------------------------Traceback (most recent call last):
File "notes/test1_unittest.py", line 8, in test_0
self.assertFalse( wheel.even( 0 ) )
AssertionError
---------------------------------------------------------------------Ran 3 tests in 0.001s
FAILED (failures=1)
The rst part (F..) is a summary of the tests being run. It shows a test failure followed by two successes.
This is followed by the details of each failure.
If everything works, youll see a string like ... and nothing more.
This shows the bug in our even(). In Roulette, the numbers 0 and 00 are neither even nor odd. Our even()
function doesnt handle 0 correctly.
452
This will allow you to do the following to install your module in the site-packages library.
453
Once youve done this, you can work in any directory on your computer and have access to your new module.
This allows you to create new applications based on modules youve already written.
454
CHAPTER
SIXTEEN
APPENDICES
16.1 Debugging Tips
16.1.1 Let There Be Python: Downloading and Installing
Tip: Debugging Windows Installation
The only problem you are likely to encounter doing a Windows installation is a lack of administrative
privileges on your computer. In this case, you will need help from your support department to either do the
installation for you, or give you administrative privileges.
455
4. Open the Environment Variables of the Advanced Tab of the System Control Panel
Click the Environment Variables... button.
This dialog box has a title of Environment Variables. It shows two areas: user variables and System
variables. Well be updating one of the system variables.
5. Edit the Path variable
This dialog box has a title of Environment Variables. Scroll through the list of System variables,
looking for Path. Click on the Path to highlight it.
Click the Edit... button.
This dialog box has a title of Edit System Variable. It has two sections to show the variable name of
Path and the variable value.
6. Add Pythons location to the Path value
This dialog box has a title of Edit System Variable. It has two sections to show the variable name of
Path and the variable value.
Click on the value and use the right arrow key to scroll through the value you nd. At the end, add
the following ;C:\python26. Dont forget the ; to separate this search location from other search
locations on the path.
Click OK to save this change. It is now a permanent part of your Windows setup on this computer.
Youll never have to change this again.
7. Finish Changing Your System Properties
The current dialog box has a title of Environment Variables. Click OK to save your changes.
The current dialog box has a title of System Properties. Click OK to save your changes.
456
If your alias doesnt work, there are some common things to conrm:
Your .profile works correctly. You can type sh -v .prole or bash -v .bash_prole to test it. If
you see error messages, likely you missed an apostrophe or messed up the spaces.
The ... is Pythons hint that the statement is incomplete. Youll need to nish the ()s so that the statement
is complete.
457
458
There are two common mistakes in the augmented assignment statement. The rst is to choose an illegal
variable name. If you get a SyntaxError: can't assign to literal or SyntaxError: invalid syntax
the most likely cause is an illegal variable name.
The other mistake is to have an invalid expression on the right side of the assignment operator. If the result
of an assignment statement doesnt look right, remember that you can always enter the various expressions
directly into IDLEs Python shell to examine the processing one step at a time.
459
d1+d2 == 12:
pays 2:1")
d1+d2==9 or d1+d2==10 or d1+d2==11:
pays even money")
loses")
Heres the subtle bug in this example. We test for 2 and 12 in the rst clause; we test for 4, 9, 10 and 11
in the second. Its not obvious that a roll of 3 is missing from the eld pays even money condition. This
fragment incorrectly treats 3, 5, 6, 7 and 8 alike in the else:.
While the else: clause is used commonly as a catch-all, a more proper use for else: is to raise an exception
because a condition was found that did not match by any of the if or elif clauses.
460
461
462
Change your current working directory to the correct location of your les. For Windows: use
CD; for GNU/Linux and MacOS: use cd. For example, if your les are in an exercises directory,
you can do cd exercises.
Include the directory name on your le. For example, if your les are in an exercises directory,
you can run the script1.py script with python exercises/script1.py.
3. If you can nd Python, and you appear to be in the correct directory, the remaining problem is
misspelling the lename for your script. This is relatively common, actually. First time GNU/Linux
and MacOS users will nd that the shell is sensitive to the case of the letters, that some letters look
alike, it is possible to embed non-printing characters in a le name, and it is unwise to use letters which
confuse the shell. We have the following advice.
File names in GNU/Linux should be one word, all lower case letters and digits. These are the
standard Python expectations for module names. While there are ways around this by using the
shells quoting and escaping rules, Python programs avoid this.
File names should avoid punctuation marks. There are only a few safe punctuation marks: -, .
and _. Even these safe characters should not be the rst character of the le name.
Some Windows programs will tack an extra .txt on your le. You may have to manually rename
the le to get rid of this.
In GNU/Linux, you can sometimes embed a space or non-printing character in a le name. To
nd this, use the ls -s to see the non-printing characters. Youll have to resort to fairly complex
shell tricks to rename a badly named le to something more useful. The % character is a wild-card
which matches any single character. If you have a le named script^M1.py, you can rename this
with mv script%1.py script1.py. The % will match he unprintable ^M in the le name.
463
464
up positional parameters in order. Finally, default values will be applied. There are several circumstances
where things can go wrong.
A parameter is not set by keyword, position or default value
There are too many positional values.
A keyword is used that is not a parameter name in the function denition.
The data in your list isnt regular enough to be sorted. For example, if we have dates that are
represented as strings like '1/10/56', '11/19/85', '3/8/87', these strings are irregular and wont
sort very nicely. As humans, we know that they should be sorted into year-month-date order, but the
strings that Python sees begin with '1/', '11' and '3/', with an alphabetic order that may not be
what you expected.
To get this data into a usable form, we have to normalize it. Normalizing is a computer science term
for getting data into a regular, consistent, usable form. In our example of sorting dates, well need
to use the time or datetime modules to parse these strings into proper Python objects that can be
compared.
465
Tip: Debugging Exception Handling First, we may have the wrong exceptions named in the except
clauses. If we evaluate a statement that raises an exception, but that exception is not named in an except
clause, the exception wont get handled.
Since Python reports the name of the exception, we can use this information to add another except clause,
or add the exception name to an existing except clause. We have to be sure we understand why were
getting the exception and we have to be sure that our handler is doing something useful. Exceptions like
RuntimeError, for example, shouldnt be handled: they indicate that something is corrupt in our Python
installation.
You wont know you spelled an exception name wrong until an exception is actually raised and the except
clauses are matched against the exception. The except clauses are merely potential statements. Once an
exception is raised, they are actually evaluated, and any misspelled exception names will cause problems.
Second, we may be raising the wrong exception. If we attempt to raise an exception, but spelled the
exceptions name wrong, well get a strange-looking NameError, not the exception we expected.
As with the except clause, the exception name in a raise clause is not examined until the exceptional
condition occurs and the raise statement is executed. Since raise statements almost always occur inside if,
elif or else suites, the condition has to be met before the raise statement is executed.
16.1.22 Looping Back : Iterators, the for statement, and the yield statement
Tip: Debugging Iterators
There are several common problems with using an explicit iterator.
Skipping items without processing them.
466
Skipping items happens when we ask for the next() method of the iterator one too many times.
Processing an item twice happens when we forget to ask for the next() method of the iterator. We see it
happen when a program picks o the header items, but fails to advance to the next item before processing
the body.
Another common problem is getting a StopIteration exception raised when trying to skip the header item
from a list or the header line from a le. In this case, the le or list was empty, and there was no header.
Often, our programs need the following kind of try block to handle an empty le gracefully.
i = iter( someSequence )
try:
next(i) Skips an item on purpse
except StopIteration:
No Items -- this is a valid situation, not an error
467
The second example passes a 4-character string, "word", which becomes a 4-element set.
In the case of creating sets from strings, theres no error message. The question is really what did you
mean? Did you intend to put the entire string into the set? Or did you intend to break the string down to
individual characters, and put each character into the set?
It is very important to get the les path completely correct. Youll notice that each time you start IDLE,
it thinks the current working directory is something like C:\Python26. Youre probably doing your work in
a dierent default directory.
When you open a module le in IDLE, youll notice that IDLE changes the current working directory is the
directory that contains your module. If you have your .py les and your data les all in one directory, youll
nd that things work out well.
The next most common error is to have the wrong permissions. This usually means trying to writing to a
le you dont own, or attempting to create a le in a directory where you dont have write permission. If
you are using a server, or a computer owned by a corporation, this may require some work with your system
administrators to sort out what you want to do and how you can accomplish it without compromising
security.
The [Errno 2] note in the error message is a reference to the internal operating system error numbers.
There are over 100 of these error numbers, all collected into the module named errno. There are a lot of
dierent things that can go wrong, many of which are very, very obscure situations.
468
Looking at the intermediate results helps us be sure that we are reading the le properly.
469
470
Be sure the modules .py le name is correct, and its located on the sys.path. Module lenames are
traditionally all lowercase, with minimal punctuation. Some operating systems (like GNU/Linux) are casesensitive and a seemingly insignicant dierence between Random.py and random.py can make your module
impossible to nd.
The two most visible places to put module les are the current working directory and the Python
Lib/site-packages directory. For Windows, this directory is usually under C:\python26\. For
GNU/Linux, this is often under the /usr/lib/python2.6/ directory. For MacOS users, this will be in
the /System/Library/Frameworks/Python.framework/Versions/Current/ directory tree.
If your module isnt valid Python, youll get syntax errors when you try to import it. You can discover the
exact errors by trying to execute the module le using the F5 key in IDLE.
If the module doesnt dene what you thought, there are two likely causes: the Python denitions are
incorrect, or youve omitted a necessary module-name qualier. For example, when we do import math
everything in that module requires the math qualier. Within a module, however, we dont need to qualify
names of other things dened in the same module le.
If your Python class or function denitions arent correct, it has nothing to do with the modularization. The
problem is more fundamental. Starting from something simple and adding features is generally the best way
to learn.
The sys.path is a list, which is searched in order. Your working directory is searched rst. When your module
has the same name as some extension module, your module will conceal that extension module. Ive spent
hours discovering that my module named Image was concealing PILs Image module.
16.1.30 Time and Date Processing : The time and datetime Modules
Tip: Debugging time
16.1. Debugging Tips
471
Because of the various conversions, its easy to get confused by having a oating-point time and a
time_struct time. When you get TypeError exceptions, you are missing a conversion between the two
representations. You can use the help() function and the Python Library Reference (chapter 6.10) to sort
this out.
16.2 Bibliography
16.2.1 Use Cases
16.2.2 Computer Science
16.2.3 Design Patterns
16.2.4 Languages
16.2.5 Project Management
16.2.6 Problem Domains
16.3 Glossary
Here are some perhaps useful denitions of concepts.
16.2. Bibliography
473
computers are responding to your browsers requests. Most of the internet things you see involve your
desktop and a server somewhere else.
We do need to note that were using the principle of abstraction. A number of electronic devices are all
computers on which we can do Python programming. Laptops, desktops, iMacs, PowerBooks, clients,
servers, Dells and HPs are all examples of this abstraction were calling a computer system.
Device, Peripheral Device We have a number of devices that are part of our computers. Most devices
are plugged into the computer box and connected by wires, putting them on the periphery of the
computer. A few devices are wireless; they connect using Bluetooth, WiFi (IEEE 802.11) or infrared
( IR) signals. We call the connection an interface.
The most important devices are hidden within the box, physically adjacent to the central processor.
These central items are memory (called random-access memory, RAM) and a disk. The disk, while
inside the box, is still considered peripheral because once upon a time, disks were huge and expensive.
The other peripheral devices are the ones we can see: display, keyboard and mouse. After that are
other storage devices, including CDs, DVDs, USB drives, cameras, scanners, printers, drawing tablets,
etc. Finally we have network connections, which can be Ethernet, wireless or a modem. All devices
are controlled by pieces of software called drivers.
Note that weve applied the abstraction principle again. Weve lumped a variety of components into
abstract categories.
Memory, RAM The computers working memory (Random-Access Memory, or RAM) contains two things:
our data and the processing instructions (or program) for manipulating that data. Most modern
computers are called stored program digital computers. The program is stored in memory along with
the data. The data is represented as digits, not mechanical analogies. In contrast, an analog computer
uses mechanical analogs for numbers, like spinning gears that make an analog speedometer show
the speed, or the strip of metal that changes shape to make an analog meat thermometer show the
temperature.
The central processor fetches each instruction from the computers memory and then executes that
instruction. We like to call this the fetch-execute loop that the processor carries out. The processor
chip itself is hardware; the instructions in memory are called software. Since the instructions are stored
in memory, they can be changed. We take this for granted every time we double click an icon and a
program is loaded into memory. The data on which the processor is working must also be in memory.
When we open a document le, we see it read from the disk into memory so we can work on it.
Memory is dynamic: it changes as the software does its work. Memory which doesnt change is called
Read-Only Memory (ROM).
Memory is volatile: when we turn the computer o, the contents vanish. When we turn the computer
on, the contents of memory are random, and our programs and data must be loaded into memory from
some persistent device. The tradeo for volatility is that memory is blazingly fast.
Memory is accessed randomly: any of the 512 million bytes of my computers memory can be accessed
with equal ease. Other kinds of memory have sequential access; for example, magnetic cassette tapes
must be accessed sequentially.
For hair-splitters, we recognize that there are special-purpose computing devices which have xed
programs that arent loaded into memory at the click of a mouse. These devices have their software in
read-only memory, and keep only data in working memory. When our program is permanently stored
in ROM, we call it rmware instead of software. Most household appliances that have computers with
ROM.
Disk, Hard Disk, Hard Drive We call these disk drives because the memory medium is a spinning magnetizable disk with read-write heads that shuttle across the surface; you can sometimes hear the clicking
as the heads move. Individual digits are encoded across the surface of the disk; grouped into blocks
474
of data. Some people are in the habit of calling them hard to distinguish them from the obsolete
oppy disks that were used in the early days of personal computing.
Our various les (or documents) inluding our programs and our data will eventually reside
on some kind of disk or disk-like device. However, the operating system interposes some structure,
discipline and protocol between our needs for saving les and the vagaries of the disk device. Well
look at this in Software Terminology and again in Working with Files.
Disk memory is described as random access, even though it isnt completely random: there are
read-write heads which move across the surface and the surface is rotating. There are delays while the
computer waits for the heads to arrive at the right position. There are also delays while the computer
waits for the disk to spin to the proper location under the heads. At 7200 RPMs, youre waiting less
than 1/7200th of a second, but youre still waiting.
Your computers disk can be imagined as persistent, slow memory: when we turn o the computer,
the data remains intact. The tradeo is that it is agonizingly slow: it reads and writes in milliseconds,
close to a million times slower than dynamic memory.
Disk memory is also cheaper than RAM by a factor of at almost 1000: we buy 500 gigabytes (500
billion bytes, or 500,000 megabytes) of disk for $100; the cost of 512 megabytes of memory.
Human Interface, Display, Keyboard, Mouse The human interface to the computer typically consists
of three devices: a display, a keyboard and a mouse. Some people use additional devices: a second
display, a microphone, speakers or a drawing tablet are common examples. Some people replace the
mouse with a trackball. These are often wired to the computer, but wireless devices are also popular.
In the early days of computers before the invention of the mouse the displays and keyboards could
only handle characters: letters, numbers and punctuation. When we used computers in the early days,
we spelled out each command, one line at a time. Now, we have the addition of sophisticated graphical
displays and the mouse. When we use computers now, we point and click, using graphical gestures as
our commands. Consequently, we have two kinds of human interfaces: the Command-Line Interface
(CLI), and the Graphical User Interface (GUI).
A keyboard and a mouse provide inputs to software. They work by interrupting what the computer is
doing, providing the character you typed, or the mouse button you pushed. A piece of software called
the Operating System has the job of collecting this stream of input and providing it to the application
software. A stream of characters is pretty simple. The mouse clicks, however, are more complex events
because they involve the screen location as well as the button information, plus any keyboard shift
keys.
A display shows you the outputs from software. The display device has to be shared by a number
of application programs. Each program has one or more windows where their output is sent. The
Operating System has the job of mediating this sharing to assure that one program doesnt disturb
another programs window. Generally, each program will use a series of drawing commands to paint the
letters or pictures. There are many, many dierent approaches to assembling the output in a window.
We wont touch on this because of the bewildering number of choices.
Historically, display devices used paper; everything was printed. Then they switched to video technology. Currently, displays use liquid crystal technology. Because displays were once almost entirely
video, we sometimes summarize the human interface as the Keyboard-Video-Mouse ( KVM).
In order to keep things as simple as possible, were going to focus on the command-line interface. Our
programs will read characters from the keyboard, and display characters in an output window. Even
though the programs we write wont respond to mouse events, well still use the mouse to interact with
the operating system and programs like IDLE.
Other Storage, CD, DVD, USB Drive, Camera These storage devices are slightly dierent from the
internal disk drive or hard drive. The dierences are the degree of volatility of the medium. Packaged
CDs and DVDs are read-only; we call them CD Read-Only Memory ( CD-ROM). When we burn our
16.3. Glossary
475
own CD or DVD, we used to call it creating a Write-Once-Read-Many ( WORM) device. Now there
are CD-RW devices which can be written (slowly) many times, and read (quickly) many times, making
the old WORM acronym outdated.
Where does that leave Universal Serial Bus USB drives (known by a wide variety of trademarked names
like Thumb Drive or Jump Drive) and the memory stick in our camera? These are just like the
internal disk drive, except they dont involve a spinning magnetized disk. They are slower, have less
capacity and are slightly more expensive than a disk.
Our operating system provides a single abstraction that makes our various disk drives and other
storage all appear to be very similar. When we look at these devices they all appear to have folders
and documents. Well return to this unication in File-Related Library Modules.
Scanner, Printer These are usually USB devices; they are unique in that they send data in one direction
only. Scanners send data into our computer; our computer sends data to a printer. These are a kind
of storage, but they are focused on human interaction: scanning or printing photos or documents.
The scanner provides a stream of data to an application program. Properly interpreted, this stream of
data is a sequence of picture elements (called pixels ) that show the color of a small section of the
document on the scanner. Getting input from the scanner is a complex sequence of operations to reset
the apparatus and gather the sequence of pixels.
A printer, similarly, accepts a stream of data. Properly interpreted, this stream of data is a sequence
of commands that will draw the appropriate letters and lines in the desired places on the page. Some
printers require a sequence of pixels, and the printer uses this to put ink on paper. Other printers use
a more sophisticated page description language, which the printer processes to determine the pixels,
and then deposits ink on paper. One example of these sophisticated graphic languages is PostScript.
Network, Ethernet, Wireless, WiFi, Dial-up, Modem A network is built from a number of cooperating technologies. Somewhere, buried under streets and closeted in telecommunications facilities is
the global Internet: a collection of computers, wires and software that cooperates to route data. When
you have a cable-modem, or use a wireless connection in a coee shop, or use the Local Area Network
(LAN) at school or work, your computer is (indirectly) connected to the Internet. There is a physical link (a wire or an antenna), there are software protocols for organizing the data and sharing the
link properly. There are software libraries used by the programs on our computer to surf web pages,
exchange email or purchase MP3s.
While there are endless physical dierences among network devices, the rules, protocols and software
make these various devices almost interchangeable. There is stack of technology that uses the principle
of abstraction very heavily to minimize the distinctions among wireless and wired connections. This
kind of abstraction assures that a program like a web browser will work precisely the same no matter
what the physical link really is. The people who designed the Internet had abstraction very rmly in
mind as a way to allow the Internet to expand with new technology and still work consistently.
476
assuring that all the programs share those resources. The operating system also manages the various
disk drives by imposing some organizing rules on the data; we call the organizing rules and the related
software the le system.
The operating system creates the desktop metaphor that we see. It manages the various windows; it
directs mouse clicks and keyboard characters to the proper application program. It depicts the le
system with a visual metaphor of folders (directories) and documents (les). The desktop is the often
shown to you by a program called the nder or explorer; this program draws the various icons and
the dock or task bar.
In addition to managing devices and resources, the OS starts programs. Starting a program means
allocating memory, loading the instructions from the disk, allocating processor time to the program,
and allocating any other resources in the processor chip.
Finally, we have to note that it is the OS that provides most of the abstractions that make modern
computing possible. The idea that a variety of individual types of devices and components could
be summarized by a single abstraction of storage allows disk drives, CD-ROMs, DVD-ROMs and
thumb drives to peacefully co-exist. It allows us to run out and buy a thumb drive and plug it into
our computer and have it immediately available to store the pictures of our trip to Sweden.
Program, Application, Software A program is started by the operating system to do something useful.
Well look at this in depth in What is a Program? and Goal-Directed Activities. Since we will be
writing our own programs, we need to be crystal clear on what programs really are and how they make
our computer behave.
There isnt a useful distinction between words like program, command, application, application
program, and application system. Some vendors even call their programs solutions. Well try to
stick to the word program. A program is rarely a single thing, so well try to identify a program with
the one le that contains the main part of the program.
File, Document, Data, Database, the File System The data you want to keep is saved to the disk
in les. Sometimes these are called documents, to make a metaphorical parallel between a physical
paper document and a disk le. Files are collected into directories, sometimes depicted as metaphorical
folders. A paper document is placed in a folder the same way a le is placed in a directory. Computer
folders, however, can have huge numbers of documents. Computer folders, also, can contain other
folders without any practical limit. The document and folder point of view is a handy visual metaphor
used to clarify the le and directory structure on our disk.
This is so important that Working with Files is devoted to how our programs can work with les.
Boot Not footwear. Not a synonym for kick, as in booted out the door. No, boot is used to describe a
particular disk as the boot disk. We call one disk the boot disk because of the way the operating
system starts running: it pulls itself up by its own bootstraps. Consider this quote from James Joyces
Ulysses: There were others who had forced their way to the top from the lowest rung by the aid of
their bootstraps.
The operating system takes control of the computer system in phases. A disk has a boot sector (or
boot block) set aside to contain a tiny program that simply loads other programs into memory. This
program can either load the expected OS, or it can load a specialized boot selection program (examples
include BootCamp, GRUB, or LiLo.) The boot program allows you to control which OS is loaded.
Either the boot sector directly loads the OS, or it loads and runs a boot program which loads the OS.
The part of the OS that is loaded into memory is just the kernel. Once the kernel starts running, it
loads a few handy programs and starts these programs running. These programs then load the rest of
the OS into memory. The device drivers must be added to the kernel. Once all of the device drivers
are loaded, and the devices congured, then the user interface components can be loaded and started.
At this point, the desktop appears.
Note that part of the OS (the kernel) loads other parts of the operating system into memory and
16.3. Glossary
477
starts them running. It pulls itself up by its own bootstraps. They call this bootstrapping, or booting.
The kernel will also load our software into memory and start it running. Well depend heavily on this
central feature of an OS.
478
CHAPTER
SEVENTEEN
479
480
CHAPTER
EIGHTEEN
481
482
BIBLIOGRAPHY
[Jacobson92] Ivar Jacobson, Magnus Christerson, Patrik Jonsson, Gunnar vergaard. Object-Oriented Software Engineering. A Use Case Driven Approach. 1992. Addison-Wesley. 0201544350.
[Jacobson95] Ivar Jacobson, Maria Ericsson, Agenta Jacobson. The Object Advantage. Business Process
Reengineering with Object Technology. 1995. Addison-Wesley. 0201422891.
[Boehm81] Barry Boehm. Software Engineering Economics. 1981. Prentice-Hall PTR. 0138221227.
[Comer95] Douglas Comer. Internetworking with TCP/IP. Principles, Protocols, and Architecture. 3rd edition. 1995. Prentice-Hall. 0132169878.
[Cormen90] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest. Introduction To Algorithms. 1990.
MIT Press. 0262031418.
[Dijkstra76] Edsger Dijkstra. A Discipline of Programming. 1976. Prentice-Hall. 0613924118.
[Gries81] David Gries. The Science of Programming. 1981. Springer-Verlag. 0387964800.
[Holt78] R. C. Holt, G. S. Graham, E. D. Lazowska, M. A. Scott. Structured Concurrent Programming with
Operating Systems Applications. 1978. Addison-Wesley. 0201029375.
[Knuth73] Donald Knuth. The Art of Computer Programming. Fundamental Algorithms.. 1973. AddisonWesley. 0201896834.
[Meyer88] Bertrand Meyer. Object-Oriented Software Construction. 1988. Prentice Hall. 0136290493.
[Parnas72] D. Parnas. On the Criteria to Be Used in Decomposing Systems into Modules. 1053-1058. 1972.
Communications of the ACM.
[Gamma95] Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. Design Patterns. Elements of
Object-Oriented Software. 1995. Addison-Wesley Professional. 0201633612.
[Larman98] Craig Larman. Applying UML and Patterns. An Introduction to Object-Oriented Analysis and
Design. 1998. Prentice-Hall. 0137488807.
[Lott05] Steven Lott. Building Skills in Object-Oriented Design. Step-by-Step Construction of A Complete
Application. 2005. Steven F. Lott.
[Rumbaugh91] James Rumbaugh, Michael Blaha, William Premerlani, Frederick Eddy, William Lorensen.
Object-Oriented Modeling and Design. 1991. Prentice Hall. 0136298419.
[Geurts91] Leo Geurts, Lambert Meertens, Steven Pemberton. The ABC Programmers Handbook. 1991.
Prentice-Hall. 0-13-000027-2.
[Gosling96] Gosling, McGilton. Java Language Environment White Paper. 1996. Sun Microsystems.
[Harbison92] Samuel P. Harbison. Modula-3. 1992. Prentice-Hall. 0-13-596396-6.
483
484
Bibliography
c
collections, 331
d
datetime, 412
f
fnmatch, 358
g
glob, 357
o
optparse, 438
os, 358
os.path, 355
r
re, 428
s
shutil, 357
sys, 360
t
time, 417
485