Gauld.a Learning To Program (Python)
Gauld.a Learning To Program (Python)
by
Alan Gauld
1
Introduction _____________________________________________7
Introduction - What, Why, Who etc. ______________7
Why am I writing this? _____________________________________________7
What will I cover _________________________________________________7
Who should read it? _______________________________________________7
Why Python?_____________________________________________________7
Other resources ___________________________________________________8
Concepts ___________________________________________________9
What do I need? __________________________________________9
Generally________________________________________________________9
Python __________________________________________________________9
Tcl/Tk __________________________________________________________9
QBASIC _______________________________________________________10
2
The Raw Materials _____________________________________20
Introduction_____________________________________________________20
Data ___________________________________________________________20
Variables _______________________________________________________20
Primitive Data Types _____________________________________________21
Character Strings_________________________________________________21
Integers ________________________________________________________23
Real Numbers ___________________________________________________24
Complex or Imaginary Numbers ____________________________________25
Boolean Values - True and False ____________________________________25
Collections _____________________________________________________26
Python Collections _______________________________________________26
Other Collection Types ____________________________________________29
Files___________________________________________________________30
Dates and Times _________________________________________________30
Complex/User Defined ____________________________________________30
Python Specific Operators _________________________________________32
3
Conversing with the user _____________________________46
>>> print raw_input("Type something: ") _____________________________46
4
Advanced Topics ____________________________________69
Recursion __________________________________________________69
What is it? ______________________________________________________69
Recursing over lists_______________________________________________69
Namespaces _______________________________________________77
Introduction_____________________________________________________77
Python's approach ________________________________________________77
And BASIC too__________________________________________________78
Tcl ____________________________________________________________78
5
Conclusions ___________________________________________100
A Case Study ____________________________________________100
Counting lines, words and characters ________________________________100
Counting sentences instead of lines _________________________________101
Turning it into a module __________________________________________102
Classes and objects ______________________________________________106
Text Document _________________________________________________108
HTML Document _______________________________________________109
Adding a GUI __________________________________________________110
References ______________________________________________115
Books to read ____________________________________________115
General Programming____________________________________________115
Object Oriented Programming _____________________________________116
6
Introduction
Introduction - What, Why, Who etc.
Why am I writing this?
The reason I am creating this tutorial is that there seems to be very little for the absolute beginner to
programming on the Web. Yet the Internet and the Web encourage interest in computers and that interest
naturally leads to a desire to "take control", which means learning to program!
Why me? Well I am a professional programmer who came to programming from an electronic engineering
background. I have used (and continue to use) several computer languages and don't have any personal
interest in promoting any particular tool or language. Oh, and nobody else seemed to be doing it!
I expect the reader of this tutorial to be an experienced user of a computer system, probably MS DOS,
Windows or Unix although others should be able to cope too. I also expect them to understand basic
mathematical concepts such as geometric coordinates, sets, and basic algebra. These are all important in
todays programming environments, and many programming concepts are based on these ideas.
I certainly will not be covering issues like how to create or copy text files, how to install software, or the
organization of files on a computer storage system. Frankly if you need to know those things you probably
are not at the stage of being able to program, regardless of your desire to do so. Find a tutorial for your
computer first, then when you're confident with the above concepts revisit.
Why Python?
Python happens to be a nice language to learn. Its syntax is simple and it has some very powerful features
built into the language. It supports lots of programming styles from the very simple through to state of the art
Object Oriented techniques. It runs on lots of platforms - Unix/Linux, MS Windows, Macintosh etc. It also
has a very friendly and helpful user community. All of these are important features for a beginner's language.
Python however is not just a beginner's language. As your experience grows you can keep on using Python
either as an end in itself or as a rapid prototyping language. There are a few things that Python is not well
suited to, but these are comparatively few and far between.
I will also use BASIC for some of the very early examples then introduce Tcl as an alternative. Why? Well,
if we accept that most Web surfers who are also beginners are using PCs with Microsoft Windows installed,
there is a version of BASIC(QBASIC) already available on the CD ROM (either NT or Win 95/98). Tcl
comes with versions of Python up to V1.5.2 (you effectively get two languages for the price of one - which
in this case is nothing!) After version 2.0 you only get a minimal Tcl install so to do the examples you will
need to download the official Tcl installer from Scriptics.
7
Other resources
There are other Web sites trying to do this in other languages. There are also lots of tutorials for those who
already know how to program but want to learn a new language. This section contains links to some of those
that I think are worthwhile!
8
Concepts
What do I need?
What will we cover?
The character and mindset of a programmer, the programming environments used in
the tutor.
Generally
In principle you don't need anything to do this course other than an Internet enabled computer - which I
assume you have if you are reading this in the first place! The other thing that is useful is the right mind set
to program. What I mean by that is an innate curiosity about things, coupled to a logical way of thinking.
These are both essential requirements for a successful programmer.
The curiosity factor comes into play in looking for answers to problems and being willing to dig around in
sometimes obscure documents for ideas and information needed to complete a task.
The logical thinking comes into play because computers are intrinsically stupid. They can't really do
anything except add single digits together and move bytes from one place to another. Luckily for us some
talented programmers have written lots of programs to hide this basic stupidity. But of course as a
programmer you may well get into a new situation where you have to face that stupidity in its raw state. At
that point you have to think for the computer. You have to figure out exactly what needs to be done to your
data and when.
So much for the philosophy! However if you want to get the best from the tutorial you will want to follow
along, either typing in the examples by hand or cutting and pasting from the Web page into your text editor.
Then you can run the programs and see the results. To do that you will need to have Python installed on your
system (and maybe Tcl and QBASIC if you want to try the comparisons)
Python
Python version 1.5.2 is the latest release at the time of writing and comes with Tcl/Tk version 8.0 thrown in
for free. This is because Python's GUI programming system (Tkinter) is built on top of Tcl/Tk. For our
purposes this is 'a good thing'(TM) but it does mean the Python download is very big (about 5Mb for the
Windows binary version. For Linux/Unix you can get source and have to build it - see your sys admin!! It
comes prebuilt in most Linux distributions these days.)
http://www.python.org/
Tcl/Tk
As just mentioned the Python 1.5.2 distribution for Windows comes with Tcl/Tk so that's no problem. If you
have a different Python version or are on a platform where it's not included then you can get Tcl/Tk from
Scriptics:
http://dev.scriptics.com
9
QBASIC
To be honest I won't be using QBASIC very much and what I do use will apply to just about any BASIC
variant around. Its possible to get BASIC for almost any platform, but QBASIC is the one I will be using. It
comes on the Windows 95/98 and Windows NT4 CD Rom and I think its on Windows ME too. If anyone
knows about Windows 2000 please let me know.
D:\other\oldmsdos\qbasic.*
If you can't find it there, use the File Explorer to search for it and simply copy it into a directory on your
PATH. It is a standard executable file running in a DOS box. It responds to mouse commands within its
menu driven environment so you should find it easy to use. There is a help facility too.
The examples that we use should just paste straight into the editor pane and run from the menu.
And that's it. Bring your brain, a sense of humour and start programming....
Points to remember
• You need logical thinking and curiosity to program
• Python, Tcl and QBASIC (on Windows only) are all freely available
10
What is Programming?
What will we cover?
An introduction to the terminology of computing plus some history and a brief look at
the structure of a computer program.
Back to BASICs
Computer Programming is the art of making a computer do what you want it to do.
At the very simplest level it consists of issuing a sequence of commands to a computer to achieve an
objective. In the Microsoft world MS DOS users used to create text files with lists of commands called BAT
files. These simply executed the sequence of commands as a BATCH, hence the name. You can still produce
these in Windows environments today but in practice they are rarely seen.
For example you might be producing a document (such as this tutorial) which comprises lots of separate
files. Your world processor may produce backup copies of each file as it saves a new version. At the end of
the day you may want to put the current version of the document (all the latest files) into a 'backup'
directory/folder. Finally, to tidy up, delete all the backup files ready to start work the next day. A simple
BAT file to do this would be:
If the file were called SAVE.BAT then at the end of each day I could simply type SAVE at a DOS prompt
and the files would be saved and backups deleted. This is a program.
Note: Users of Linux or other operating systems have their own versions of these files often known as shell
scripts. Unix shell scripts are much more powerful than DOS BAT files, and support most of the
programming techniques that we will be discussing in this course.
A little history
Just as you speak to a friend in a language so you 'speak' to the computer in a language. The only language
that the computer understands is called binary and there are several different dialects of it - which is why
that cool iMac program won't run on your PC and vice versa. Binary is unfortunately very difficult for
humans to read or write so we have to use an intermediate language and get it translated into binary for us.
This is rather like watching Clinton and Yeltsin talking at a summit meeting - Clinton speaks, then an
interpreter repeats what has been said in Russian. Yeltsin replies and the interpreter again repeats the
sentence, this time in English.
Surprisingly enough the thing that translates our intermediate language into binary is also called an
interpreter. And just as you usually need a different interpreter to translate English into Russian than you do
to translate Arabic into Russian so you need a different computer interpreter to translate Python into binary
from the one that translates BASIC into binary.
11
The very first programmers actually had to enter the binary codes themselves, this is known as machine code
programming and is incredibly difficult. The next stage was to create a translator that simply converted
English equivalents of the binary codes into binary so that instead of having to remember that the code
001273 05 04 meant add 5 to 4 programmers could now write ADD 5 4. This very simple
improvement made life much simpler and these systems of codes were really the first programming
languages, one for each type of computer. They were known as assembler languages and Assembler
programming is still used for a few specialized programming tasks today.
Even this was very primitive and still told the computer what to do at the hardware level - move bytes from
this memory location to that memory location, add this byte to that byte etc. It was still very difficult and
took a lot of programming effort to achieve even simple tasks.
Gradually computer scientists developed higher level computer languages to make the job easier. This was
just as well because at the same time users were inventing ever more complex jobs for computers to solve!
This competition between the computer scientists and the users is still going on and new languages keep on
appearing. This makes programming interesting but also makes it important that as a programmer you
understand the concepts of programming as well as the pragmatics of doing it in one particular language.
I'll discuss some of those common concepts next, but we will keep coming back to them as we go through
the course.
• Sequences of instructions
• Loops
• Branches
• Modules
Along with these structures programs also need a few more features to make them useful:
• Data
• Operations (add, subtract, compare etc)
• Input/Output capability (e.g. to display results)
Once you understand those concepts and how a particular programming language implements them then you
can write a program in that language.
We already said that programming was the art of making a computer do what you want, but what is a
program?
In fact there are two distinct concepts of a program. The first is the one perceived by the user - an executable
file that is installed and can be run repeatedly to perform a task. For example users speak of running their
Word processor program. The other concept is the program as seen by the programmer, this is the text file of
instructions to the computer, written in some programming language, that can be translated into an
executable file. So when you talk about a program always be clear about which concept you mean.
Basically a programmer writes a program in a high level language which is interpreted into the bytes that the
computer understands. In technical speak the programmer generates source code and the interpreter
generates object code. Sometimes object code has other names like: P-Code, binary code or machine code.
The interpreter has a couple of names, one being the interpreter and the other being the compiler. These
terms actually refer to two different techniques of generating object code from source code. It used to be the
12
case that compilers produced object code that could be run on its own (an executable file - another term)
whereas an interpreter had to be present to run its program as it went along. The difference between these
terms is now blurring however since some compilers now require interpreters to be present to do a final
conversion and some interpreters simply compile their source code into temporary object code and then
execute it.
From our perspective it makes no real difference, we write source code and use a tool to allow the computer
to read and execute it.
Batch programs
These are typically started from a command line (or automatically via a scheduler utility) and tend to follow
a pattern of:
Most GUI systems (and embedded control systems - like your Microwave, camera etc) are event driven.
That is the operating system sends events to the program and the program responds to these as they arrive.
Events can include things a user does - like clicking the mouse or pressing a key - or things that the system
itself does like updating the clock or refreshing the screen.
13
Points to remember
• Programs control the computer
• Programming languages allow us to 'speak' to the computer at a level that is
closer to how humans think than how computers 'think'
• Programs operate on data
• Programs can be either Batch oriented or Event driven
14
Getting Started
What will we cover?
How to start Python and what an error message looks like - just in case...
For the next set of exercises I will assume you have a properly installed version of Python on your computer.
If not, go fetch the latest version from the Python web site and follow the install instructions for your
platform.
Now from a command prompt type python and the Python prompt should appear looking something like
this:
Don't worry about the exact meaning here just look at the structure.
The '>>> print ...' line is the erroneous command
The next 2 lines are describing where the error occurred
- 'line 1 in ?' means line 1 in the command we are typing. If it were a longer program stored in a source file
the question mark would be replaced by the file name.
The 'TypeError...' line tells you what the interpreter thinks is wrong and sometimes there will be a caret
character(^) pointing to the part of the line that Python thinks is at fault.
Unfortunately this will often be wrong - remember computers are dumb!
Use the error information to figure out what's happening. Remember it's most likely to be you at fault not the
computer. Remember too that computers are dumb. Probably you just mistyped something or forgot a quote
sign or something similar. Check carefully.
In case you are wondering, the mistake I made was trying to add a number to a character string. You're not
allowed to do that so Python objected and told me there was a TypeError. You'll need to wait till we get to
the bit about 'Data' to understand what types are all about....
Now we are ready to start creating some very simple Python programs.
Points to remember
• Start python by typing python at a command prompt
• Error messages are nothing to be scared of, read them carefully, they usually
give a clue as to why you got them.
• But it's only a clue... if in doubt check the lines immediately before the
reported line.
15
The Basics
Simple Sequences
What will we cover?
Single commands, the use of Python as a calculator, using brackets to get the correct
result and using format strings to combine text and numbers. Finally we see how to
quit Python from within a program.
A simple sequence of instructions is the most basic program you can write. The simplest sequence is one
containing a single command. We will try out some of these now. The heading will describe what you
should type at the '>>>' Python prompt, the following paragraph will explain what happens.
The print command is the way to get Python to display its results to you. In this case it is printing the
sequence of characters H,e,l,l,o, ,t,h,e,r,e,!. Such a sequence of characters is known in
programming circles as a string of characters or a character string or just a plain string.
You signify a string by surrounding it in quotes. You can use either single quotes(as above) or double
quotes: "a string ". This allows you to include one type of quote within a string which is surrounded by the
other type - useful for apostrophes:
>>> print "Monty Python's Flying Circus has a ' within it..."
>>>print 6 + 5
Here we have printed the result of an arithmetic operation - we added six and five. Python recognized the
numbers as such and the plus sign and did the sum for us. It then printed the result.
So straight away you have a use for Python: its a handy 'pocket calculator'! Try a few more sums. Use some
other arithmetic operators:
• subtract (-)
• multiply (*)
• divide (/)
Notice the way I used brackets to group the numbers together. What happens if you type the same sequence
without the brackets? This is because Python will evaluate the multiplication and division before the addition
and subtraction. This is usually what you would expect mathematically speaking but it may not be what you
expect as a programmer! All programming languages have rules to determine the sequence of evaluation of
operations and this is known as operator precedence. You will need to look at the reference documentation
for each language to see how it works. With Python it's usually what logic and intuition would suggest, but
occasionally it won't be...
16
As a general rule its safest to include the brackets to make sure you get what you want when dealing with
long series of sums like this.
results in a whole number (integer) result (i.e. 2). This is because Python sees that the numbers are whole
numbers and assumes you want to keep them that way. If you want decimal fractions as a result simply write
one number as a decimal:
% is known as the modulus or mod operator and in other languages is often seen as MOD or similar.
You've seen that we can print strings and numbers. Now we combine the two in one print statement,
separating them with a comma. We can extend this feature by combining it with a useful Python trick for
outputting data called a format string:
In this command the format string contains '%' markers within it. The letter 'd' after the % tells Python that a
'decimal number' should be placed there. The values to fill in the markers are obtained from the values inside
the bracketed expression following the % sign on its own.
There are other letters that can be placed after the % markers. Some of these include:
• %s - for string
• %x - for hexadecimal number
• %0.2f - for a real number with a maximum of 2 decimal places
• %04d - pad the number out to 4 digits with 0's
In fact you can print any Python object with the print command. Sometimes the result will not be what you
hoped for (perhaps just a description of what kind of object it is) but you can always print it.
17
>>>import sys
Now this is a strange one. If you've tried it you'll see that it apparently does nothing. But that's not really
true. To understand what happened we need to look at the architecture of Python (for non Python
programmers, bear with me there will be a similar mechanism available to you too!)
When you start Python there are a bunch of commands available to you called built-ins, because they are
built in to the Python core. However Python can extend the list of commands available by incorporating
extension modules. - It's a bit like buying a new tool in your favourite DIY shop and adding it to your
toolbox. The tool is the sys part and the import operation puts it into the toolbox.
In fact what this command does is makes available a whole bunch of new 'tools' in the shape of Python
commands which are defined in a file called 'sys.py'. This is how Python is extended to do all sorts of clever
things that are not built in to the basic system. You can even create your own modules and import and use
them, just like the modules provided with Python when you installed it.
>>>sys.exit()
Whoops! What happened there? Simply that we executed the exit command defined in the sys module.
That command causes Python to exit. (Note: Normally you exit Python by typing the End Of File(EOF)
character at the >>> prompt - CTRL-Z on DOS or CTRL-D on Unix)
Notice that exit had 2 brackets after it. That's because exit is a function defined in sys and when we call a
Python function we need to supply the parentheses even if there's nothing inside them!
Try typing sys.exit without the brackets. Python responds by telling you that exit is a function rather than by
executing it!
One final thing to notice is that the last two commands are actually only useful in combination. That is, to
exit from Python other than by typing EOF you need to type:
import sys
sys.exit()
This is a sequence of two commands! Now we're getting closer to real programming....
Using Tcl
We can also type simple commands like this into Tcl too. The Tcl interpreter is started by typing tclsh80
(assuming you have Tcl v8.0)at a DOS prompt. The command prompt is a '%' sign. Try the following
examples:
Note that in the last example the section in square brackets is evaluated first and the result passed to the put
command. Unlike Python you can't assume the put will attempt to interpret what you mean, it expects a
character string and it's up to you to ensure it gets one.
18
And BASIC too...
To start BASIC type QBASIC at the DOS prompt. In this case a whole programming environment will start.
Get rid of the welcome dialog etc and in the edit window you can type commands and then run whatever is
in the window using the Run menu. This has the advantage that the environment allows you to edit the
commands, and even does some checks on the text as you enter it.
That's our first look at programming, it wasn't too painful was it? Before we continue tho' we need to take a
look at the raw materials of programming, data and what we can do with it.
Points to remember
• Even a single command is a program
• Python does math almost the way you'd expect
• To get a fractional result you must use a fractional number
• You can combine text and numbers using the % format operator
• Quit with import sys; sys.exit()
19
The Raw Materials
What will we cover?
• What Data is
• What Variables are
• Data Types and what to do with them
• Defining our own data types
Introduction
In any creative activity we need three basic ingredients: tools, materials and techniques. For example when I
paint the tools are my brushes, pencils and palettes. The techniques are things like ‘washes’, wet on wet,
blending, spraying etc. Finally the materials are the paints, paper and water. Similarly when I program, my
tools are the programming languages, operating systems and hardware. The techniques are the programming
constructs that we discussed in the previous section and the material is the data that I manipulate. In this
chapter we look at the materials of programming.
This is quite a long section and by its nature you might find it a bit dry, the good news is that you don’t need
to read it all at once. The chapter starts off by looking at the most basic data types available, then moves on
to how we handle collections of items and finally looks at some more advanced material. It should be
possible to drop out of the chapter after the collections material, cover a couple of the following chapters and
then come back to this one as we start to use the more advanced bits.
Data
Data is one of those terms that everyone uses but few really understand. My dictionary defines it as:
That's not too much help but at least gives a starting point. Let’s see if we can clarify things by looking at
how data is used in programming terms. Data is the “stuff”, the raw information, that your program
manipulates. Without data a program cannot perform any useful function. Programs manipulate data in many
ways, often depending on the type of the data. Each data type also has a number of operations - things that
you can do to it. For example we’ve seen that we can add numbers together. Addition is an operation on the
number type of data. Data comes in many types and we’ll look at each of the most common types and the
operations available for that type:
Variables
Data is stored in the memory of your computer. You can liken this to the big wall full of boxes used in mail
rooms to sort the mail. You can put a letter in any box but unless the boxes are labelled with the destination
address it’s pretty meaningless. Variables are the labels on the boxes in your computer's memory.
Knowing what data looks like is fine so far as it goes but to manipulate it we need to be able to access it and
that’s what variables are used for. In programming terms we can create instances of data types and assign
them to variables. A variable is a reference to a specific area somewhere in the computers memory. These
areas hold the data. In some computer languages a variable must match the type of data that it points to. Any
attempt to assign the wrong type of data to such a variable will cause an error. Some programmers prefer this
type of system, known as static typing because it can prevent some subtle bugs which are hard to detect.
In Python a variable takes the type of the data assigned to it. It will keep that type and you will be warned if
you try to mix data in strange ways - like trying to add a string to a number. (Recall the example error
message? It was an example of just that kind of error.) We can change the type of data that a variable points
to by reassigning the variable.
20
>>> q = 7 # q is now a number
>>> print q
7
>>> q = "Seven" # reassign q to a string
>>> print q
Seven
Note that q was set to point to the number 7 initially. It maintained that value until we made it point at the
character string "Seven". Thus, Python variables maintain the type of whatever they point to, but we can
change what they point to simply by reassigning the variable. At that point the original data is 'lost' and
Python will erase it from memory (unless another variable points at it too) this is known as garbage
collection.
Garbage collection can be likened to the mailroom clerk who comes round once in a while and removes any
packets that are in boxes with no labels. If he can't find an owner or address on the packets he throws them in
the garbage. Let’s take a look at some examples of data types and see how all of this fits together.
Character Strings
We've already seen these. They are literally any string or sequence of characters that can be printed on your
screen. (In fact there can even be non-printable control characters too).
'Here is a string'
One special use of the latter form is to build in documentation for Python functions that we create ourselves -
we'll see this later.
You can access the individual characters in a string by treating it as an array of characters (see arrays below).
There are also usually some operations provided by the programming language to help you manipulate
strings - find a sub string, join two strings, copy one to another etc.
21
String Operators
There are a number of operations that can be performed on strings. Some of these are built in to Python but
many others are provided by modules that you must import (as we did with sys in the Simple Sequences
section).
String operators
Operator Description
S1 + S2 Concatenation of S1 and S2
S1 * N N repetitions of S1
Notice that the last two examples produced the same output.
In BASIC, if a variable is a string variable you must terminate the name with a $.
Having done that you cannot ever assign a number to it. Similarly, if it is an integer
variable (ends in %) you cannot assign a string to it. BASIC does allow 'anonymous
variables' that don't end in anything. These can only store numbers however, either
real or integer numbers but only numbers. Here is an example of a string variable in
BASIC:
DIM MyString$
PRINT MyString$
Tcl Strings
Tcl uses strings internally for everything. From the users point of view however this is not usually obvious.
When explicitly dealing with a string you surround it in double quotes. To assign a value to a variable in Tcl
use the set command and to read a string variable (or indeed any variable in Tcl) put a '$' in front of the
name, like so:
22
Note: in both Tcl and BASIC only double quotes can be used for strings.
Integers
Integers are whole numbers from a large negative value through to a large positive value. That’s an
important point to remember. Normally we don’t think of numbers being restricted in size but on a computer
there are upper and lower limits. The size of this upper limit is known as MAXINT and depends on the
number of bits used on your computer to represent a number. On most current computers it's 32 bits so
MAXINT is around 2 billion.
Numbers with positive and negative values are known as signed integers. You can also get unsigned integers
which are restricted to positive numbers, including zero. This means there is a bigger maximum number
available of around 2 * MAXINT or 4 billion on a 32 bit computer since we can use the space previously
used for representing negative numbers to represent more positive numbers.
Because integers are restricted in size to MAXINT adding two integers together where the total is greater
than MAXINT causes the total to be wrong. On some systems/languages the wrong value is just returned as
is (usually with some kind of secret flag raised that you can test if you think it might have ben set). Normally
an error condition is raised and either your program can handle the error or the program will exit. Python
adopts this latter approach while Tcl adopts the former. BASIC throws an error but provides no way to catch
it (at least I don't know how!)
Arithmetic Operators
We've already seen most of the arithmetic operators that you need in the 'Simple Sequences' section,
however to recap:
Operator
Description
Example
M+N Addition of M and N
M-N Subtraction of N from M
M*N Multiplication of M and N
Division, either integer or floating point result depending on the types
M/N of M and N. If either M or N are real numbers(see below) the result
will be real.
M%N Modulo: find the remainder of M divided by N
M**N Exponentiation: M to the power N
We haven’t seen the last one before so let’s look at an example of creating some integer variables and using
the exponentiation operator:
23
BASIC Integers
BASIC has somre extra rules around integers. To declare an integer variable in BASIC you can either use a
plain unadorned name or you can signal to BASIC that it is an integer we wish to store(this will be slightly
more efficient). We do this by ending the name with '%':
i% = 7
PRINT 2 * i%
i% = 4.5
PRINT 2 * i%
Notice that the assignment of 4.5 to i% seemed to work but only the integer part was actually assigned. This
is reminiscent of the way Python dealt with division of integers. All programming languages have their own
little idiosyncracies like this!
Tcl Numbers
As mentioned earlier Tcl stores everuything internally as strings, however this doesn't really make any
diffeence to the user because Tcl converts the values into numbers and back again under the covers, as it
were. Thus all the restrictions on number sizes still apply.
Using numbers in Tcl is slightly more complex than in most languages since to do any calculations you have
to signal to the interpreter that a calculation is needed. You do that with the expr command:
% put [expr 6 + 5]
11
Tcl ses the square brackets and evaluates that part first, as if it had been typed at the command line. In doing
so it sees the expr command and does the calculation. The result is then put to the screen. If you try to
put the sum directly Tcl will just print out "6 + 5":
% put 6 + 5
6 + 5
Real Numbers
These are fractions. They can represent very large numbers, much bigger than MAXINT, but with less
precision. That is to say that 2 real numbers which should be identical may not seem to be when compared
by the computer. This is because the computer only approximates some of the lowest details. Thus 4.0 could
be represented by the computer as 3.9999999.... or 4.000000....01. These approximations are close enough
for most purposes but occasionally they become important! If you get a funny result when using real
numbers, bear this in mind.
Floating point numbers have the same operations as integers with the addition of the capability to truncate
the number to an integer value.
24
Complex or Imaginary Numbers
If you have a scientific or mathematical background you may be wondering about complex numbers? If you
aren't you may not even have heard of complex numbers! Anyhow some programming languages, includinh
Python, provide builtin support for the complex type while others provide a library of functions which can
operate on complex numbers. And before you ask, the same applies to matrices too.
(real+imaginaryj)
>>> M = (2+4j)
>>> N = (7+6j)
>>> print M + N
(9+10j)
Boolean values are sometimes known as "truth values" because they are used to test whether something is
true or not. For example if you write a program to backup all the files in a directory you might backup each
file then ask the operating system for the name of the next file. If there are no more files to save it will return
an empty string. You can then test to see if the name is an empty string and store the result as a boolean
value (true if it is empty). You'll see how we would use that result later on in the course.
Operator
Description Effect
Example
A and B AND True if A,B are both True, False otherwise.
True if either or both of A,B are true. False if both A
A or B OR
and B are false
A == B Equality True if A is equal to B
A != B
or Inequality True if A is NOT equal to B.
A <> B
not B Negation True if B is not True
Note: the last one operates on a single value, the others all compare two values.
25
Collections
Computer science has built a whole discipline around studying collections and their various behaviours.
Sometimes collections are called containers. In this section we will look first of all at the collections
supported in Python then we’ll conclude with a brief summary of some other collection types you might
come across in other languages.
Python Collections
List
A list is a sequence of items. What makes it different from an array is that it can keep on growing - you just
add another item. But it's not usually indexed so you have to find the item you need by stepping through the
list from front to back checking each item to see if it's the item you want. Both Python and Tcl have lists
built into the language. In BASIC it's harder and we have to do some tricky programming to simulate them.
BASIC programmers usually just create very big arrays instead. Python also allows you to index it's lists. As
we will see this is a very useful feature.
List operations
Python provides many operations on collections. Nearly all of them apply to Lists and a subset apply to other
collection types, including strings which are just a special type of list of characters. To create and access a
list in Python we use square brackets. You can create an empty list by using a pair of square brackets with
nothing inside, or create a list with contents by separating the values with commas inside the brackets:
>>> aList = []
>>> another = [1,2,3]
>>> print another
[1, 2, 3]
We can access the individual elements using an index number, where the first element is 0, inside square
brackets:
We can also change the values of the elements of a list in a similar fashion:
>>> another[2] = 7
>>> print another
[1, 2, 7]
You can use negative index numbers to access members from the end of the list. This is most commonly
done using -1 to get the last item:
We can also add new elements to the end of a list using the append() operator:
>>> aList.append(42)
>>> print aList
[42]
We can even hold one list inside another, thus if we append our second list to the first:
>>> aList.append(another)
26
>>> print aList
[42, [1, 2, 7]]
Notice how the result is a list of two elements but the second element is itself a list (as shown by the []’s
around it). This is useful since it allows us to build up representations of tables or grids using a list of lists.
We can then access the element 7 by using a double index:
The first index, 1, extracts the second element which is in turn a list. The second index, 2, extracts the third
element of the sublist.
The opposite of adding elements is, of course, removing them and to do that we use the del command:
If we want to join two lists together to make one we can use the same concatenation operator ‘+’ that we saw
for strings:
In the same way we can apply the repetition operator to populate a list with multiples of the same value:
Finally, we can determine the length of a list using the built-in len() function:
Tcl Lists
Tcl also has a built in list type and a variety of commands for operating on these lists. These commands are
identifiable by the 'l' prefix, for example linsert,lappend, lindex, etc. An example of creating a
simple Tcl list and accessing a member follows:
% set L [list 1 2 3]
% put [lindex $L 2]
3
27
Tuple
Not every language provides a tuple construct but in those that do it’s extremely useful. A tuple is really just
an arbitrary collection of values which can be treated as a unit. In many ways a tuple is like a list, but with
the significant difference that tuples are immutable which is to say that you can’t change them nor append to
them once created. In Python, tuples are simply represented by parentheses containing a comma separated
list of values, like so:
The main things to remember are that while parentheses are used to define the tuple, square brackets are
used to index it and you can’t change a tuple once its created. Otherwise most of the list operations also
apply to tuples.
Dictionary or Hash
A dictionary as the name suggests contains a value associated with some key, in the same way that a literal
dictionary associates a meaning with a word. The value can be retrieved by ‘indexing’ the dictionary with
the key. Unlike a literal dictionary the key doesn’t need to be a character string(although it often is) but can
be any immutable type including numbers and tuples. Similarly the values associated with the keys can be
any kind of Python data type. Dictionaries are usually implemented internally using an advanced
programming technique known as a hash table. For that reason a dictionary may sometimes be referred to as
a hash. This has nothing to do with drugs!
Because access to the dictionary values is via the key you can only put in elements with unique keys.
Dictionaries are immensely useful structures and are provided as a built-in type in Python although in many
other languages you need to use a module or even build your own. We can use dictionaries in lots of ways
and we'll see plenty examples later, but for now, here's how to create a dictionary in Python, fill it with some
entries and read them back:
>>> dict = {}
>>> dict['boolean'] = "A value which is either true or false"
>>> dict['integer'] = "A whole number"
>>> print dict['boolean']
A value which is either true or false
Notice that we initialise the dictionary with braces, then use square brackets to assign and read the values.
Due to their internal structure dictionaries do not support very many of the collection operators that we’ve
seen so far. None of the concatenation, repetition or appending operations work. To assist us in accessing the
dictionary keys there is a function that we can use, keys(), which returns a list of all the keys in a
dictionary.
If you're getting a bit fed up, you can jump to the next chapter at this
point. Remember to come back and finish this one when you start to
come across types of data we haven't mentioned so far.
28
Other Collection Types
Array or Vector
A list of items which are indexed for easy and fast retrieval. Usually you have to say up front how many
items you want to store. Lets say I have an array called A, then I can extract the 3rd item in A by writing
A[3]. Arrays are fundamental in BASIC, in fact they are the only built in collection type. In Python arrays
are simulated using lists and in Tcl arrays are implemented using dictionaries.
MyArray(1) = 27
MyArray(2) = 50
FOR i =1 TO 5
PRINT MyArray(i)
NEXT i
Notice that the index starts at 1 in BASIC, this is unusual and in most languages the index will start at 0.
There are no other operations on arrays, all you can do is create them, assign values and read values.
Stack
Think of a stack of trays in a restaurant. A member of staff puts a pile of clean trays on top and these are
removed one by one by customers. The trays at the bottom of the stack get used last (and least!). Data stacks
work the same way: you push an item onto the stack or pop one off. The item popped is always the last one
pushed. This property of stacks is sometimes called Last In First Out or LIFO. One useful property of stacks
is that you can reverse a list of items by pushing the list onto the stack then popping it off again. The result
will be the reverse of the starting list. Stacks are not built in to Python, Tcl or BASIC. You have to write
some program code to implement the behaviour. Lists are usually the best starting point since like stacks
they can grow as needed.
Bag
A bag is a collection of items with no specified order and it can contain duplicates. Bags usually have
operators to enable you to add, find and remove items. In Python and Tcl bags are just lists. In BASIC you
must build the bag from a large array.
Set
A set has the property of only storing one of each item. You can usually test to see if an item is in a set
(membership). Add, remove and retrieve items and join two sets together in various ways corresponding to
set theory in math (eg union, intersect etc). None of our sample languages implement sets directly but they
can be easily implemented in both Python and Tcl by using the built in dictionary type.
Queue
A queue is rather like a stack except that the first item into a queue is also the first item out.This is known as
First In First Out or FIFO behaviour.
There's a whole bunch of other collection types but these are the main ones that you might see. (In fact we'll
only be dealing with a few of these in this tutor!)
29
Files
As a computer user you know all about files - the very basis of nearly everything we do with computers. It
should be no surprise then, to discover that most programming languages provide a special file type of data.
However files and the processing of them are so important that I will defer discussing them till later when
they get a whole section to themselves.
Complex/User Defined
Sometimes the basic types described above are inadequate even when combined in collections. Sometimes
what we want to do is group several bits of data together then treat it as a single item. An example might be
the description of an address:
a house number, a street and a town. Finally there's the post code or zip code.
Type Address
HsNumber AS INTEGER
Street AS STRING * 20
Town AS STRING * 15
ZipCode AS STRING * 7
End Type
The number after the STRING is simply the maximum length of the string.
>>>class Address:
... def __init__(self, Hs, St, Town, Zip):
... self.HsNumber = Hs
... self.Street = St
... self.Town = Town
... self.ZipCode = Zip
...
That may look a little arcane but don't worry I’ll explain what the def __init__(...) and self bits
mean in the section on object orientation. Some people have had problems trying to type this example in at
the Python prompt. At the end of this chapter you will find a box with more explanation, but you can just
wait till we get the full story later in the course if you prefer. If you do try typing it into Python then please
make sure you copy the indentation shown. As you'll see later Python is very particular about indentation
levels.
The main thing I want you to recognise in all of this is that we have gathered several pieces of data into a
single structure.
30
Accessing Complex Types
We can assign a complex data type to a variable too, but to access the individual fields of the type we must
use some special access mechanism (which will be defined by the language). Usually this is a dot.
To consider the case of the address type we defined above we would do this in BASIC:
And in Python, assuming you have already typed in the class definition above:
Which creates an instance of our Address type and assigns it to the variable addr. We then print out the
HsNumber and Street fields of the newly created instance using the dot operator. You could, of course,
create several new Address type variables each with their own individual values of house number, street etc.
In Tcl the nearest approximation to complex types is to simply store the fields in a list. You need to
remember the sequence of the fields so that you can extract them again. This could be simplified a little by
assoigning the field nu,mbers to variables, in this way the example above would look like:
set HsNum 0
set Street 1
set Town 2
set zip 3
set addr [list 7 "High St" "Anytown" "123 456"]
puts [format "%s %s" [lindex $addr $HsNum] [lindex $addr $Street]]
Note the use of the Tcl format string and the nested sets of '[]'s
User defined types can, in some languages, have operations defined too. This is the basis of what is known
as object oriented programming. We dedicate a whole section to this topic later but essentially an object is a
collection of data elements and the operations associated with that data, wrapped up as a single unit. Python
uses objects extensively in its standard library of modules and also allows us as programmers to create our
own object types.
Object operations are accessed in the same way as data members of a user defined type, via the dot operator,
but otherwise look like functions. These special functions are called methods. We have already seen this with
the append() operation of a list. Recall that to use it we must tag the function call onto the variable name:
31
When an object type, known as a class, is provided in a module we must import the module (as we did with
sys earlier, then prefix the object type with the module name to create an instance that we can store in a
variable. We can then use the variable without using the module name.
We will illustrate this by considering a fictitious module meat which provides a Spam class. We import the
module, create an instance of Spam and access its operations and data like so:
Other than the need to create an instance, there’s no real difference between using objects provided within
modules and functions found within modules. Think of the object name simply as a label which keeps
related functions and variables grouped together.
Another way to look at it is that objects represent real world things, to which we as programmers can do
things. That view is where the original idea of objects in programs came from: writing computer simulations
of real world situations.
Neither QBASIC nor Tcl provide facilities for adding operators to complex types. There are however add on
libraries for Tcl which allow this and the more modern Visual Basic dialect of BASIC does permit this.
For example Python supports such relatively uncommon operations as list slicing ( spam[X:Y] ) and tuple
assignment ( X, Y = 12, 34 ). It also has the facility to perform an operation on every member of a
collection using its map() function. There are many more, it’s often said that "Python comes with the
batteries included". For details of how these Python specific operations work you’ll need to consult the
Python documentation.
Finally, it’s worth pointing out that although I say they are Python specific, that is not to say that they can’t
be found in any other languages but rather that they will not all be found in every language. The operators
that we cover in the main text are generally available in some form in virtually all modern programming
languages.
That concludes our look at the raw materials of programming, let’s move onto the more exciting topic of
technique and see how we can put these materials to work.
32
More information on the Address example
Although, as I said earlier, the details of this example are explained later, some readers have found difficulty
getting the example to work. This note gives a line by line explanation of the Python code:
The class statement tells Python that we are about to define a new type called, in this case, Address.
The colon indicates that any indented lines following will be part of the class definition. The definition will
end at the next unindented line. If you are using IDLE you should find that the editor has indented the next
line for you, if working at a command line Python prompt in an MS DOS window then you will need to
manually indent the lines as shown. Python doesn't care how much you indent by, just so long as it is
consistent.
The first item within our class is what is known as a method definition. This method is called __init__
and is a special operation performed by Python when we create an instance of our new class, we'll see that
shortly. The colon, as before, simply tells Python that the next set of indented lines will be the actual
definition of the method.
... self.HsNumber = Hs
This line plus the next three, all assign values to the internal fields of our object. They are indented from the
def statement to tell Python that they constitute the actual definition of the __init__ operation.The
blank line tells the Python interpreter that the class definition is finished so that we get the >>> prompt
back.
This creates a new instance of our Address type and Python uses the __init__ operation defined above to
assign the values we provide to the internal fields. The instance is assigned to the Addr variable just like an
instance of any other data type would be.
33
Now we print out the values of two of the internal fields using the dot operator to access them.
As I said we cover all of this in more detail later in the tutorial. The key point to take away is that Python
allows us to create our own data types and use them pretty much like the built in ones.
Points to remember
• Data comes in many types and the operations you can successfully perform
will depend on the type of data you are using.
• Simple data types include character strings, numbers, Boolean or 'truth' values.
• Complex data types include collections, files, dates and user defined data
types.
• There are many operators in every programming language and part of learning
a new language is becoming familiar with both its data types and the operators
available for those types.
• The same operator (e.g. addition) may be available for different types, but the
results may not be identical, or even apparently related!
34
More Sequences and Other Things
What will we cover?
We introducec a new tool for entering Python programs. We look at the use of
variables to store information for future use. Also how to combine a sequence of
commands to perform a task.
OK, Now we know how to type simple single entry commands into Python and have started to consider data
and what we can do with it. Let's see what happens when we type multiple commands into Python.
There is a full tutor on using IDLE on the Python web site under the IDLE topic. There is also a gentler one
which covers some of the same Python things we are discussing here as well, at Danny Yoo's web site. I'd
suggest you start with Danny's one then once you feel more confident with IDLE go back to the official one
at python.org.
If you are using MS Windows there is yet another option in the form of PythonWin which you can download
as part of the winall package. This gives access to all the Windows MFC low level programming functions
and importantly, a very good alternative to IDLE. It only works in Windows but in my opinion is slightly
superior to IDLE. On the other hand IDLE is very new and in subsequent releases may overtake PythonWin.
Whatever happens, it's nice to have a choice!
A quick comment
One of the most important of programming tools is one that beginners often feel is useless on first
acquaintance - comments. Comments are just lines in the program which describe what's going on. They
have no effect whatsoever on how the program operates, they are purely decorative. They do have an
important role to play - they tell the programmer what's going on and more importantly why. This is
especially important if the programmer reading the code isn't the one who wrote it, or, its a long time since
he/she wrote it. Once you've been programming for a while you'll really appreciate good comments. From
now on I'll be commenting the code fragments that I write. Gradually the amount of explanatory text will
diminish as the explanation appears in comments instead.
Every language has a way of indicating comments. In BASIC it's REM at the beginning of a comment.
Everything after the REM is ignored:
You might recognise REM if you have ever written any MSDOS batch files, since they use the same
comment marker.
Most BASICs also allow you to use ' instead of REM which is easier to type but harder to see. The choice is
yours.
35
Python and Tcl both use a # symbol as their comment marker. Anything following a # is ignored:
Incidentally this is very bad commenting style. Your comment should not merely state what the code does -
we can see that for ourselves! It should explain why it's doing it:
>>> v = 7
>>> w = 18
>>> x = v + w # use our variables in a calculation
>>> print x
What's happening here is that we are creating variables ( v, w, x ) and manipulating them. Its rather
like using the M button on your pocket calculator to store a result for later use.
We can make this prettier by using a format string to print the result:
One advantage of format strings is that we can store them in variables too:
Order matters
By now you might be thinking that this sequence construct is a bit over-rated and obvious. You would be
right in so far as its fairly obvious but its not quite as simple as it might seem. There can be hidden traps.
Consider the case where you want to 'promote' all the headings in an HTML document up a level:
Now in HTML the headings are set by surrounding the text with
<H1>text</H1> for level 1 headings,
<H2>text</H2> for level 2 headings,
<H3>text</H3> for level 3 headings and so on.
The problem is that by the time you get to level 5 headings the heading text is often smaller than the body
text, which looks odd. Thus you might decide to promote all headings up one level. Its fairly easy to do that
with a simple string substitution in a text editor, substitute '<H2' with '<H1' and '</H2' with '</H1' and so on.
Consider though what happens if you start with the highest numbers - say H4 -> H3, then do H3 -> H2 and
finally H2 -> H1. All of the headings will have moved to H1! Thus the order of the sequence of actions is
important. The same is just as true if we wrote a program to do the substitution (which we might well want
to do, since promoting headings may be a task we do regularly).
36
A Multiplication Table
I'm now going to introduce a programming exercise that we will develop over the next few chapters. The
solutions will gradually improve as we learn new techniques.
Recall that we can type long strings by enclosing them in triple quotes? Let's use that to construct a
multiplication table:
>>> s = """
1 x 12 = %d
2 x 12 = %d
3 x 12 = %d
""" # be careful - you can't put comments inside
>>> # strings, they'll become part of the string!
>>> print s % (12, 2*12, 3*12)
By extending that we could print out the full 12 times table from 1 to 12. But is there a better way? The
answer is yes, let's see what it is.
Points to remember
• IDLE is a cross platform development tool for writing Python programs.
• Comments can make programs clearer to read but have no effect on the
operation of the program
• Variables can store intermediate results for later use
37
Looping - Or the art of repeating oneself!
What will we cover?
How to use loops to cut down on repetitive typing. Different types of loop and when
to use them.
In the last exercise we printed out part of the 12 times table. But it took a lot of typing and if we needed to
extend it, it would be very time consuming. Fortunately there is a better way and its where we start to see the
real power that programming languages offer us.
FOR Loops
What we are going to do is get the programming language to do the repetition, substituting a variable which
increases in value each time it repeats. In Python it looks like this:
>>>for i in range(1,13):
... print "%d x 12 = %d" % (i, i*12)
...
Note 1: We need the range(1,13) to specify 13 because range() generates from the first number up
to, but not including, the second number. This may seem somewhat bizarre at first but there are reasons and
you get used to it.
Note 2: The for operator in Python is actually a foreach operator in that it applies the subsequent code
sequence to each member of a collection. In this case the collection is the list of numbers generated by
range(). You can prove that by typing print range(1,13) at the python prompt and seeing what
gets printed.
Note 3: The print line is indented further than the for line above it. That is a very important point since
it's how Python knows that the print is the bit to repeat. It doesn't matter how much indentation you use so
long as it's consistent.
Note 4: In the interactive interpreter you need to hit return twice to get the program to run. The reason is that
the Python interpreter can't tell whether the first one is another line about to be added to the loop code or not.
When you hit Enter a second time Python assumes your finished entering code and runs the program.
Next python makes i equal to the first value in the list, in this case 1. It then executes the bit of code that is
indented, using the value i = 1:
Python then goes back to the for line and sets i to the next value in the list, this time 2. It again executes
the indented code, this time with i = 2:
It keeps repeating this sequence until it has set i to all the values in the list. At that point it moves to the next
command that is not indented - in this case there aren't any more commands so the program stops.
38
Here's the same loop in BASIC:
FOR I = 1 to 12
PRINT I, " x 12 = ", I*12
NEXT I
This is much more explicit and easier to see what is happening. However the Python version is more flexible
in that we can loop over a set of numbers, the items in a list or any other collection (e.g. a string).
And in Tcl
Tcl uses a for construct that is common in many programming languages, being modelled on C. It looks
like this:
The loop body will only execute if the test part is true. Each of these parts can contain
arbitrary code but the test part must evaluate to a boolean value (which in Tcl means
zero or non-zero). Note that although I have shown the loop body indented, this is
purely to aid understanding. Tcl does not require me to indent the block, rather the
curly braces are used to mark the beginning and end.
WHILE Loops
FOR loops are not the only type of looping construct available. Which is just as well since FOR loops
require us to know, or be able to calculate in advance, the number of iterations that we want to perform. So
what happens when we want to keep doing a specific task until something happens but we don't know when
that something will be? For example, we might want to read and process data from a file, but we don't know
in advance how many data items the file contains. We just want to keep on p[rocessing data until we reach
the end of the file. That's possible but difficult in a FOR loop.
To solve this problem we have another type of loop: the WHILE loop. It looks like this in BASIC:
J = 1
WHILE J <= 12
PRINT J, " x 12 = ", J*12
J = J + 1
WEND
This produces the same result as before but uses a while loop instead of a for loop. Notice the structure
is while, followed by an expression which evaluates to a Boolean value (true or false, remember?). If the
expression is true, the code inside the loop is executed.
39
As an alternative we'll look at the Tcl version:
set j 1
while {$j <= 12} {
puts [format "%d x 12 = %s" $j [expr $j*12]]
set j [expr $j + 1]
}
As you see the structure is pretty similar just some curly brackets or braces instead of the WEND in BASIC.
But what's that mess inside the loop? Remember format strings in Python? format is Tcl's equivalent. The
$j just means the value of j (rather than the letter 'j'!) and the expr just says 'calculate the next bit as an
expression'. The square brackets tell Tcl which bits to do first. Tcl is an unusual language in that it attempts
to interpret its code in one go, so without the brackets it would try to print the word 'expr' then see some
more values and give up with an error message. We need to tell it to do the sums, then format the string, then
print the result. Confused? Don't worry about it. As I said Tcl is an unusual language with a few uniquely
good points and a lot of strangeness.
>>> j = 1
>>> while j <= 12:
... print "%d x 12 = %d" % (j, j*12)
... j = j + 1
By now that should look pretty straightforward. Just one thing to point out - do you see the colon (:) at the
end of the while and for lines above? That just tells Python that there's a chunk of code (a block) coming
up. Most languages have an end of block marker (like BASIC's WEND or Tcl's braces) but Python uses
indentation to indicate the structure. This means that its important to indent all of the lines inside the loop by
the same amount, this is good practice anyway since it's easier to read!
Now this means we have to change the 12 to a 7 twice. And if we want another value we have to change it
again. Wouldn't it be better if we could enter the multiplier that we want?
We can do that by replacing the values in the print string with another variable. Then set that variable before
we run the loop:
>>> multiplier = 12
>>> for j in range(1,13):
... print "%d x %d = %d" % (j, multiplier, j*multiplier)
That's our old friend the 12 times table. But now to change to the seven times, we only need to change the
value of 'multiplier'.
Note that we have here combined sequencing and loops. We have first a single command, multiplier
= 12 followed, in sequence by a for loop.
40
Looping the loop
Let's take the previous example one stage further. Suppose we want to print out all of the times tables from 2
to 12 (1 is too trivial to bother with). All we really need to do is set the multiplier variable as part of a loop,
like this:
Notice that the part indented inside the first for loop is exactly the same loop that we started out with. It
works as follows:
We set multiplier to the first value (2) then go round the second loop.
Then we set multiplier to the next value (3) and go round the inner loop again,
and so on. This technique is known as nesting loops.
One snag is that all the tables merge together, we could fix that by just printing out a separator line at the end
of the first loop, like this:
Note that the second print statement lines up with the second 'for', it is the second statement in the loop
sequence. Remember, the indenting level is very important in Python.
Experiment with getting the separator to indicate which table it follows, in effect to provide a caption. Hint:
You probably need to use the multiplier variable and a format string.
Other loops
Some languages provide more looping constructs but some kind of for and while are usually there.
(Modula 2 and Oberon only provide while loops since they can simulate for loops - as we saw above.)
Other loops you might see are:
do-while
Same as a while but the test is at the end so the loop always executes at least
once.
repeat-until
Similar to above but the logic of the test is reversed.
GOTO, JUMP, LOOP etc
Mainly seen in older languages, these usually set a marker in the code and
then explicitly jump directly to it.
Points to remember
• FOR loops repeat a set of commands for a fixed number of iterations.
• WHILE loops repeat a set of commands until some terminating condition is met.
They may never execute the body of the loop idf the terminating condition is
false to start with.
• Other types of loops exist but FOR and WHILE are nearly always provided.
• Python for loops are really foreach loops - they operate on a list of items.
• Loops may be nested one inside another.
41
Coding Style
What will we cover?
Several new uses for comments, how to layout code using indentation to improve
readability and an introduction to the use of modules for storing our programs.
Comments
I've already spoken about comments in the 'More Sequences' section. However there are more things we can
do with comments and I'll enlarge on those here:
It ius good practice to create a file header at the start of each file. This should provide details such as the
creation daye, author, date, version and a general description of the contents. Often a log of changes. This
block will appear as a comment:
#############################
# Module: Spam.py
# Author: A.J.Gauld
# Date: 1999/09/03
# Version: Draft 0.4
#
# Description: This module provides a Spam object which can be
# combined with any other type of Food object to create
# interesting meal combinations.
#
###############################
# Log:
# 1999/09/01 AJG - File created
# 1999/09/02 AJG - Fixed bug in pricing strategy
# 1999/09/02 AJG - Did it right this time!
# 1999/09/03 AJG - Added broiling method(cf Change Req #1234)
################################
import sys, string, food
...
This technique is often used to isolate a faulty section of code. For example, assume a program reads some
data, processes it, prints the output and then saves the results back to the data file. If the results are not what
we expect it would be useful to temporarily prevent the (erroneous)data being saved back to the file and thus
corrupting it. We could simply delete the relevant code but a less radical approach is simply to convert the
lines into comments like so:
data = readData(datafile)
for item in data:
results.append(calculateResult(item))
printResults(results)
######################
# Comment out till bug in calculateResult fixed
# for item in results:
# dataFile.save(item)
######################
print 'Program terminated'
42
Once the fault has been fixed we can simply delete the comment markers to make the code active once more.
Documentation strings
All languages allow you to create comments to document what a function or module does, but a few such as
Python and Smalltalk go one stage further and allow you to document the function in a way that the
language/environment can use to provide interactive help while programming. In Python this is done using
the """documentation""" string style:
class Spam:
"""A meat for combining with other foods
def __init__(self):
...
print Spam.__doc__
Note: We can access the documentation string by printing the special __doc__ variable. Modules,
Functions and classes/methods can all have documentation strings. For example try:
import sys
print sys.__doc__
Indentation
This is one of the most hotly debated topics in programming. It almost seems that every programmer has
his/her own idea of the best way to indent code. As it turns out there have been some studies done that show
that at least some factors are genuinely important beyond cosmetics - ie they actually help us understand the
code better.
The reason for the debate is simple. In most programming languages the indentation is purely cosmetic; an
aid to the reader. (In Python it is in fact needed and is essential to proper working of the program!) Thus:
FOR I = 1 TO 10
PRINT I
NEXT I
FOR I = 1 TO 10
PRINT I
NEXT I
so far as the BASIC interpreter is concerned. Its just easier for us to read with indentation.
The key point is that indentation should reflect the logical structure of the code thus visually it should follow
the flow of the program. To do that it helps if the blocks look like blocks thus:
XXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXX
43
which reads better than:
XXXXXXXXXXXXXXXXXXXXX
XXXXX
XXXXXXXXXXXX
XXXXXXXXXXXX
XXXXXXXXXXXX
XXXXX
because its clearly all one block. Studies have shown significant improvements in comprehension when
indenting reflects the logical block structure. In the small samples we've seen so far it may not seem
important but when you start writing programs with hundreds or thousands of lines it will become much
more so.
Variable Names
The variable names we have used so far have been fairly meaningless, mainly because they had no meaning
but simply illustrated techniques. In general its much better if your variable names reflect what you want
them to represent. For example in our times table exercise we used 'multiplier' as the variable to indicate
which table we were printing. That is much more meaningful than simply 'm' - which would have worked
just as well and been less typing.
Its a trade-off between comprehensibility and effort. Generally the best choice is to go for short but
meaningfull names. Too long a name becomes confusing and is difficult to get right consistently(for example
I could have used the_table_we_are_printing instead of multiplier but it's far too long and
not really much clearer.
Modular Programming
I'll explain this in more detail later but for now I simply want to describe the concept and importantly how
we can use modules to save our work - currently all your programs have been lost as soon as you exit
Python.
While the Python interactive interpreter prompt (>>>) is very useful for trying out ideas quickly, it loses all
you type the minute you exit. In the longer term we want to be able to write programs and then run them
over and over again. To do this in Python we create a text file with an extension .py (this is a convension
only, you could use anything you like. But it's a good idea to stick with convention in my opinion...). You
can then run your programs from a command prompt by typing:
$ python spam.py
Where spam.py is the name of your Python program file.
Note for Unix users: The first line of a Python script file should contain the sequence #! followed by the
full path of python on your system. (You can find that by typing $ which python at your shell
prompt.)
Note for Windows users: Under Windows you can set up an association for files ending .py within
Explorer. This will allow you to run Python programs by simply double clicking the file's icon. This should
already have been done by the installer. You can check by finding some .py files and trying to run them. If
they start (even with a Python error message) it's set up.
44
The other advantage of using files to store the programs is that you can edit mistakes without having to
retype the whole fragment or, in IDLE, cursor all the way up past the errors to reselect the code. IDLE
supports having a file open for editing and running it from the 'Edit|Run module' menu.
From now on I won't normally be showing the >>> prompt in examples, I'll assume you are creating the
programs in a separate file and running them either within IDLE or from a command prompt (my personal
favourite).
Points to remember
• Comments can be used to temporarily prevent code from executing, which is
useful when testing or 'debugging' code.
• Comments can be used to proivide an explanatory header with version history
of tye file.
• Documentation strings can be usede to provide run-time information about a
module and the objects within it.
• Indentation of blocks of code helps the reader see clearly the logical structure
of the code.
• By typing a python program into a file instead of at the Python '>>>' prompt
the program can be saved and run on demand by typing $ python
progname.py at the command prompt or by double clicking the filename
within an Explorer window on Windows.
45
Conversing with the user
What will we cover?
How to prompt the user to enter data and how to read that data once it is entered. We
will show how to read both numerical and string based data. Also we look at how to
read data input as command line arguments.
So far our programs have only dealt with static data. Data that, if need be, we can examine before the
program runs and thus write the program to suit. Most programs aren't like that. Most programs expect to be
driven by a user, at least to the extent of being told what file to open, edit etc. Others prompt the user for data
at critical points. Let's see how that can be done before we progress any further.
raw_input has a cousin called input. The difference is that raw_input collects the characters the
user types and presents them as a string, whereas input collects them and tries to form them into a
number. For example if the user types '1','2','3' then input will read those 3 characters and convert them into
the number 123.
Unfortunately there's a big snag to using input. That's because input doesn't just evaluate numbers but
rather treats any input as Python code and tries to execute it. Thus a knowledgable but malicious user could
type in a Python command that deleted a file on your PC! For this reason it's better to stick to raw_input
and convert the string into the data type you need using Python's built in conversion functions. This is
actually pretty easy:
You see? We just wrapped the raw_input call in a call to int. It has the same effect as using input but is
much safer. There are other conversion functions too so that you can convert to floats etc as well.
BASIC INPUT
In BASIC the INPUT statement reads input from the user thus:
46
As you see its very similar to Python except you put the variable at the end. Also BASIC uses INPUT for
both numbers and strings. There are usually a few extra features in BASIC's INPUT statement. You should
look at the documentation for your particular version.
Tcl has its own input mechanism, which is based around files (which may include the standard input and
output 'files') and a command called gets. This reads input from the specified file which in our case will be
stdin.
[ Note: This program will not work from the standard tclsh80 or wish80 prompt. Instead you will need to
type it into a file (say input.tcl) and run it from the command prompt like so:
C:\PROJECTS\Tcl>tclsh80 input.tcl
]
The -nonewline option to puts simply prevents the cursor from moving to the next line after
displaying the prompt message. flush forces stdout to write its contents immediately to ensure that it
appears on screen. The for loop is almost identical to the version we saw in the loops section.
In Python it lives in the sys module and is called sys.stdin and raw_input() uses it automatically.
Tcl can read from any file using gets (short for getstring). You can do the same in Python, try this:
import sys
print "Type a value: ", # comma prevents newline
value = sys.stdin.readline() # use stdin explicitly
print value
The advantage of the explicit vesion is that you can do fancy things like make stdin point to a real file so
the program reads its input from the file rather than the terminal - this can be useful for long testing sessions
wherby instead of sitting typing each input as requested we simply let the program read its input from a file.
[ This has the added advantage of ensuring that we can run the test repeatedly, sure that the input will be
exactly the same eaach time, and so hopefully will the output. This technique of repeating previous tests to
ensure that nothing got broken is called regression testing by programmers. ]
Finally there is also a sys.stdout 'file' that can likewise be redirected, this time to a file. print is
equivalent to:
47
Obviously if stdout did not refer to the screen then the output would be written to a file. This is how the
operating system commands work when we use redirection at the command prompt:
C:> dir
C:> dir > dir.txt
The first command prints a directory listing to the screen. The second prints it to a file. By using the '>'
sign we tell the program to redirect stdout to the file dir.txt.
48
Command Line Parameters
One other type of input is from the command line. For example when you run your text editor like:
EDIT Foo.txt
In most languages the system provides an array or list of strings containing the command line words. Thus
the first element will contain the command itself, the second element will be the first argument, etc. There is
usually some kind of magic variable that holds the number of elements in the list.
In Python that list is held by the sys module and called argv (for 'argument values'). We can extract the
elements using indexing or by iterating over the list, thus:
import sys
for item in sys.argv:
print item
Note that this only works if you put it in a file (say args.py) and execute it from the operating system prompt
like this:
Once again you will need to run this as a script from the operating system command prompt and provide
some sample arguments.
And BASIC
While Tcl does not appear to have an 'input' equivalent, BASIC does not seem to have an argv equivalent
although it would be possible to use operating system features to access them - for example they are stored
in an environment variable under DOS so you can use the GETENV function. That's far too advanced for
this course however and I recommend that in BASIC programs you prompt the user for the values
interactively.
49
That's really as far as we'll go with user input in this course. It's very primitive but you can write useful
programs with it. In the early days of Unix or PCs it's the only kind of interaction you got. Python, Tcl and
BASIC (in its 'Visual' incarnation) are all capable of writing sophisticated GUI programs with windows,
dialogs etc... but that's a bit too advanced for this course. Having said that the case study does provide a brief
example of getting input via a GUI in Python but we won't be explaining too much about how it works.
There are Web tutorials available for doing that once you get a good grounding in the essentials, I'll list some
of them in the references page.
Points to remember
• Use input for reading numbers, raw_input for reading characters/strings.
• Both input and raw_input can display a string to prompt the user.
• BASIC's INPUT command can be used for any type of data.
• Command line parameters can be obtained from the argv list imported from
the sys module in Python, where the first item is the name of the program.
• TCL uses the similarly named argv list to get it's command line data, but the
program name is in the separate argv0
• The __name__ variable will be set to "__main__" if the module has been run
from the command line (or double clicked in Windows).
50
Decisions, Decisions
What will we cover?
• The 3rd programming construct - Branching
• Single branches and multiple branches
• Using Boolean expressions
The 3rd of our fundamental building blocks is branching or conditional statements. These are simply terms
to describe the ability within our programs to execute one of several possible sequences of code(branches)
depending on some condition.
Back in the early days of Assembler programs the simplest branch was a JUMP instruction where the
program literally jumped to a specified memory address, usually if the result of the previous instruction was
zero. Amazingly complex programs were written with virtually no other form of condition possible -
vindicating Dijkstras statement about the minimum requirements for programming. When high level
languages came along a new version of the JUMP instruction appeared called GOTO. In fact BASIC still
provides GOTO and you can try it out by typing the following bit of code:
Notice how even in such a short program it takes a few seconds to figure out what's going to happen. There
is no structure to the code, you have to literally figure it out as you read it. In large programs it becomes
impossible. For that reason most modern programming languages either don't have a direct JUMP or GOTO
statrement or discourage you from using it.
The if statement
The most intuitively obvious conditional statement is the if, then, else construct. It follows the
logic of English in that if some boolean condition is true then a block of statements is executed otherwise (or
else) a different block is executed.
Hopefully that is easier to read and understand than the previous GOTO example. Of course we can put any
test condition we like after the if, so long as it evaluates to True or False, i.e. a boolean value.
51
print "This is never printed"
else:
sys.exit()
You can go on to chain these if/then/else statements together by nesting them one inside the other like so:
Note:we used == to test for equality in each if statement, whereas we used = to assign values to the
variables. Using = when you mean to use == is one of the common mistakes in programming Python,
fortunately Python warns you that it's a syntax error, but you might need to look closely to spot the problem.
Boolean Expressions
You might remember that in the 'Raw Materials' section we mentioned a Boolean type of data. We said it
had only two values: true or false. We very rarely create a Boolean variable but we often create
temporary Boolean values using expressions. An expression is a combination of variables and values
combined by operators to produce a value. In the foloowing example:
if x < 5:
print x
x < 5 is the expression. Expressions can be arbitrarily complex provided they evaluate to a single final
value. In the case of a branch that value must be either true or false. However, the definition of these 2
values varies from language to language. In many languages false is the same as 0 or a non-existent
value(often called NULL, Nil or None). Thus an empty list or string evaluates to false in a Boolean
context. This means we can use a while loop to process a list until the list is empty, or example. Python,
QBASIC and Tcl all take this approach to Boolean values.
Tcl branches
Tcl's if statement is very similar, looking like this:
if {$x < 5} {
puts $x
}elseif {$x == 5} {
puts "its 5!"
}else{
puts "greater than 5"
}
That should be straightforward, of course the elseif and else parts are optional. But you probably
guessed that.
52
Case statements
A sequence of nested if/else/if/else... is such a common construction that many languages provide a special
type of branch for it. This is often referred to as a Case or Switch statement and the Tcl version looks
like:
switch $width {
100 { set area 0}
200 { set length [expr {$length * 2}] }
500 { set width [expr {$width / 2}] }
}
Python does not provide an explicit case construct but rather compromises by providing an easier
if/elseif/else format:
Note the use of elif and the fact that the indentation (all important in Python) does not change. It's also
worth pointing out that both this version and the earlier Python example of this program are equally valid,
the second is just a little easier to read if there are many tests.
BASIC also provides a slightly more cumbersome version of this technique with ElseIf...THEN which
is used in exactly the same way as the Python elif but is rarely seen since CASE is easier to use.
Things to Remember
• Use if/else to branch
• The else is optional
• Multiple decisions can be represented using a CASE or if/elif construct
• Boolean expressions return true or false
53
Modular Programming
What will we cover?
• What modules are about
• Functions as modules
• Using module files
• Writing our own functions and modules
What's a Module?
The 4th element of programming is modular programming. In fact its not strictly necessary, and using what
we've covered so far you can actually write some pretty impressive programs. However as the programs get
bigger it becomes harder and harder to keep track of what's happening and where. We really need a way to
abstract away some of the details so that we can think about the problems we are trying to solve rather than
the minutae of how the computer works. To some extent that's what Python, BASIC etc already do for us
with their built in capabilities - they prevent us from having to deal with the hardware of the computer, how
to read the individual keys on the keyboard etc.
The role of modular programming is to allow the programmer to extend the built in capabilities of the
language. It packages up bits of program into modules that we can 'plug in' to our programs. The first form
of module was the subroutine which was a block of code that you could jump to (rather like the GOTO
mentioned in the branching section) but when the block completed, it could jump back to wherever it was
called from. That specific style of modularity is known as a procedure or function. In Python and some other
languages the word module has taken on a specific meaning which we will look at shortly, but first let's
consider functions a bit more closely.
Using Functions
Before considering how to create functions let's look at how we use the many, many functions that come
with any programming language (often called the library).
We've already seen some functions in use and listed others in the operators section. Now we'll consider what
these have in common and how we can use them in our programs.
That is a variable takes on a value obtained by calling a function. The function can accept 0 or many
arguments which it treats like internal variables. Functions can call other functions internally. Let's consider
some examples in our various languages to see how this works:
BASIC: MID$(str$,n,m)
This prints the next m characters starting at the nth in str$. (Recall that names ending in '$' in BASIC
signify a string)
54
BASIC: ENVIRON$(str$)
PRINT ENVIRON$("PATH")
Prints the current PATH as set in DOS (usually via the autoexec.bat file).
Tcl: llength L
Note: almost everything in Tcl is a function(or as Tcl prefers to term it, a command). This leads to some
awkward syntax but makes it very easy for the computer to read Tcl programs. This is important because Tcl
stands for Tool Control Language and was designed to be embedded in other programs as a macro language
like Visual Basic for Applications(VBA) in Microsoft products. You can actually embed Python in the same
way but Tcl is unique in that it was designed first and foremost with embedding in mind.
Python: pow(x,y)
Here we generate values of y from 0 to 10 and call the built-in pow() function passing 2 arguments: x and
y. On each call the current values of x and y are substituted into the call and the result is printed.
Python: dir(m)
Another useful function built in to python is dir which, when passed the name of a module, gives back a
list of valid names - often functions - in that module. Try it on the builtin functions:
print dir(__builtin__)
Note: To use it on any other module you need to import the module first otherwise Python will complain
that it doesn't recognise the name.
Before doing much else we'd better talk about Python modules in a bit more detail.
Using Modules
Python is an extremely extendable language (as indeed is Tcl) in that you can add new capabilities by
importing modules. We'll see how to create modules shortly but for now we'll play with some of the
standard modules that ship with Python.
55
sys
We met sys already when we used it to exit from Python. It has a whole bunch of other useful functions
too. To gain access to these we must import sys:
You can import and use any of Pythons modules in this way and that includes modules you create yourself.
We'll see how to do that in a moment. First though, I'll give you a quick tour of some of Python's standard
modules and some of what they offer:
56
• time - get the current time (expressed in seconds)
• gmtime - convert time in secs to UTC (GMT)
• localtime - convert to local time instead
• mktime - inverse of localtime
• sleep - pause program for n seconds
These are just the tip of the iceberg. There are literally dozens of modules provided with Python, and as
many again that you can download. (A good source is the Vaults of Parnassus.) Look at the documentation
to find out how to do internet programming, graphics, build databases etc.
The important thing to realize is that most programming languages have these basic functions either built in
or as part of their standard library. Always check the documentation before writing a function - it may
already be there! Which leads us nicely into...
So lets create a function that can print out a multiplication table for us for any value that we provide as an
argument. In BASIC it looks like:
Note: We defined a parameter called N% and passed an argument of 7 . The local variable N% inside the
function took the value 7 when we called it. We can define as many parameters as we want in the function
definition and the calling programs must provide values for each parameter. Some programming languages
allow you to define default values for a parameter so that if no value is provided the function assumes the
default. We'll see this in Python later.
def times(n):
for i in range(1,13):
print "%d x %d = %d" % (i, n, i*n)
Note that these functions do not return any values (they are really what some languages call procedures). In
fact notice that the BASIC version actually uses the keyword SUB rather than FUNCTION. This stands for
subroutine, a little used term from the age of assembler programming that in BASIC means a function that
57
does not return a value. Python by contrast uses the term def which is short for 'define' and that which
follows is assumed to be a function.
Recall that I mentioned the use of default values? One sensible use for these would be in a function which
returned the day of the week. If we call it with no value we mean today, otherwise we provide a day number
as an argument. Something like this:
Note: We only need to use the time module if the default parameter value is involved, therefore we defer the
import operation until we need it. This would provide a slight performance improvement if we never had to
use the default value feature of the function.
What if we wanted to define a function that just returned the values of the multiplication as an array of
numbers? In BASIC it looks like:
And in Python:
def times(n):
# create new empty list
values = []
for i in range(1,13):
values.append(i*n)
return values
This would be pretty dumb, because it's easier to just calculate i*n on demand. But hopefully you see the
idea. A more practical function which returns a value might be one which counts the words in a string. You
could use that to calculate the words in a file by adding the totals for each line together.
58
The code for that might look something like this:
def numwords(s):
list = split(s) # list with each element a word
return len(list) # return number of elements in list
Now if you tried it, you'll know that it didn't work. What I've done is a common design technique which is to
sketch out how I think the code should look but not bothered to use the absolutely correct code. This is
sometimes known as Pseudo Code or in a slightly more formal style Program Description Language (PDL).
Once we've had a closer look at file and string handling, a little later in the course, we'll come back to this
example and do it for real.
Tcl Functions
We can also create functions in Tcl, of course, and we do so using the proc command, like so:
Note that by using the Tcl lappend list command I automatically create a list called results and start
adding elements to it.
A Word of Caution
Tcl is a little different in the way it deals with functions. In fact you may have noticed that I have been
calling the builtin Tcl functions commands. That's because in Tcl every command you type at the Tcl prompt
is actually a function call. Most languages come with a set of keywords like for, while, if/else
and so on. Tcl makes all of these control keywords commands or functions. This has the interesting,
confusing and very powerful effect of allowing us to redefine builtin control structures like this:
set i 3
while {$i < 10} {
puts $i
set i [expr $i + 1]
}
As expected this prints out the numbers from 3 to 9 (1 less than 10). But let's now define our own version of
the while command:
proc while {x y} {
puts "My while now"
}
set i 3
while {$i < 10} {
puts $i
set i [expr $i + 1]
}
59
This does nothing but print the message "My while now". The expression and command sequence are
ignored because Tcl treats them as parameters to the while function and the while function expects them but
ignores them! So you can see how we define procedures in Tcl and how we can abuse that to create very
confusing programs - don't do it unless you have a very good reason!
Python Modules
A module in Python is nothing special. It's just a plain text file full of Python program statements. Usually
these statements are function definitions. Thus when we type:
we effectively copy the contents of sys.py into our program, almost like a cut n' paste operation. (its not
really like that but the concept is OK). In fact in some programming languages (noteably C++) the translator
literally does simply copy module files into the current program as required.
So to recap, we create a module by creating a Python file containing the functions we want to reuse in other
programs. Then we just import our module exactly like we do the standard modules. Easy eh? Let's do it.
Copy the function below into a file by itself and save the file with the name timestab.py
def print_table(multiplier):
print "--- Printing the %d times table ---" % multiplier
for n in range(1,13):
print "%d x %d = %d" % (n, multiplier, n*multiplier)
Important Note:If you didn't start Python from the same directory that you stored the timestab.py file then
Python might not have been able to find the file and reported an error. If so then you can create an
environment variable called PYTHONPATH that holds a list of valid directories to search for modules (in
addition to the standard modules supplied with Python).
Creating environment variables is a platform specific operation which I assume you either know how to do
or can find out!
60
Modules in BASIC and Tcl
What about BASIC? That's more complex.... In QBASIC and other older varieties there is no real module
concept. You have to manually cut n' paste from previous projects into your current one using your text
editor. However in Visual Basic there is a module concept and you can load a module via the Integrated
Development Environment (IDE) File|Open Module... menu. There are a few restrictions as to
what kind of things you can do inside a BASIC module but since we're not using Visual Basic on this course
I won't go into that any further. (Note: there is a cut down version of Visual Basic known as the COM
Controls Edition, CCE, available for free download on Microsoft's website if you feel like experimenting.
Also Windows 98, 2000 and IE5 all install a cut down version of VB called VBScript which you can use in
files ending .vbs)
Finally Tcl, as ever(!), takes a somewhat eclectic, but nonetheless interesting, path with regard to reusing
modules (or as it prefers to call them libraries).
At the simplest level you can just create a file of Tcl functions as we do in Python and then, in your program,
source the file. This literally causes the interpreter to read your file and those programs become available
for use. But there is a more interesting option:
You can create your files as above, put them all in a directory/folder and then run a mk_index command.
This builds an index of all the functions and files in the folder. Then in your program you simply call the
required function and the Tcl interpreter will realize the function is not available and automatically look in
the index file. It will then source the relevant source file from the library and execute the function.
Once sourced the function stays available so there is little performance overhead involved. The only snag is
that the programmer must avoid having more than one function with the same name. This feature of Tcl is
known as autoloading.
Next we'll take a look at files and text handling and then as promised revisit the business of counting words
in a file. In fact we're eventually going to create a module of text handling functions for our convenience.
Things to remember
• Functions are a form of module
• Functions return values, procedures don't
• Python modules normally consist of function definitions in a file
• Create new functions with the def keyword in Python
• Use SUB or FUN in BASIC and proc in Tcl
61
Handling Files and Text
What will we cover?
• How to open a file
• How to read and write to an open file
• How to close a file.
• Building a word counter
Handling files often poses problems for beginners although the reason for this puzzles me. Files in a
programming sense are no different from files that you use in a word processor or other application: you
open them, do some work and then close them again.
The biggest differences are that in a program you access the file sequentially, that is, you read one line at a
time starting at the beginning. In practice the word processor often does the same, it just holds the entire file
in memory while you work on it and then writes it all back out when you close it. The other difference is that
you normally open the file as read only or write only. You can write by creating a new file from scratch (or
overwriting an existing one) or by appending to an existing one.
One other thing you can do while processing a file is that you can go back to the beginning.
Now we will write a program to read the file and display the output - like the 'cat' command in Unix or the
'type' command in DOS.
Note 1: open() takes two arguments. The first is the filename (which may be passed as a variable or a
literal string, as we did here). The second is the mode. The mode determines whether we are opening the file
for reading(r) or writing(w), and also whether it's for ASCII text or binary usage - by adding a 'b' to the 'r' or
'w', as in: open(fn,"rb")
Note 2: We read and close the file using functions preceded by the file variable. This notation is known as
method invocation and is our first glimpse of Object Orientation. Don't worry about it for now, except to
realize that it's related in some ways to modules. You can think of a file variable as being a reference to a
module containing functions that operate on files and which we automatically import every time we create a
file type variable.
Consider how you could cope with long files. First of all you would need to read the file one line at a time
(in Python by using readline() instead of readlines(). You might then use a line_count variable
which is incremented for each line then tested to see whether it is equal to 25 (for a 25 line screen). If so, you
62
request the user to press a key (enter say) before resetting the line_count to zero and continuing. You might
like to try that as an excercise...
Really that's all there is to it. You open the file, read it in and manipulate it any way you want to. When
you're finished you close the file. To create a 'copy' command in Python, we simply open a new file in write
mode and write the lines to that file instead of printing them. Like this:
Did you notice that I added a print statement just to reassure the user that something actually happened? This
kind of user feedback is usually a good idea.
One final twist is that you might want to append data to the end of an existing file. One way to do that would
be to open the file for input, read the data into a list, append the data to the list and then write the whole list
out to a new version of the old file. If the file is short that's not a problem but if the file is very large, maybe
over 100Mb, then you will simply run out of memory to hold the list. Fortunately there's another mode "a"
that we can pass to open() which allows us to append directly to an existing file just by writing. Even
better, if the file doesn't exist it will open a new file just as if you'd specified "W".
As an example, lets assume we have a log file that we use for capturing error messages. We don't want to
delete the existing messages so we choose to append the error, like this:
def logError(msg):
err = open("Errors.log","a")
err.write(msg)
err.close()
In the real world we wpuld probably want to limit the size of the file in some way. A common technique is
to create a filename based on the date, thus when the date changes we automatically create a new file and it
is easy for the maintainers of the system to find the errors for a particular day and to archive away old error
files if they are not needed.
Counting Words
Now let's revisit that word counting program I mentioned in the previous section. Recall the Pseudo Code
looked like:
def numwords(s):
list = split(s) # list with each element a word
return len(list) # return number of elements in list
63
Now we know how to get the lines from the file let's consider the body of the numwords() function. First
we want to create a list of words in a line. By looking at the Python reference documentation for the
string module we see there is a function called split which separates a string into a list of fields separated
by whitespace (or any other character we define). Finally, by again referring to the documentation we see
that the builtin function len() returns the number of elements in a list, which in our case should be the
number of words in the string - exactly what we want.
import string
def numwords(s):
list = string.split(s) # need to qualify split() with string
module
return len(list) # return number of elements in list
inp = open("menu.txt","r")
total = 0 # initialise to zero; also creates variable
inp.close()
That's not quite right of course because it counts the '&' character as a word (although maybe you think it
should...). Also, it can only be used on a single file (menu.txt). But its not too hard to convert it to read the
filename from the command line ( argv[1]) or via raw_input() as we saw in the 'Talking to the
user' section. I leave that as an excercise for the reader.
BASIC Version
BASIC uses a concept called streams to identify files. These streams are numbered which can make BASIC
file handling tedious. This can be avoided by using a handy function called ??? which retirns the next free
stream number. If you store this in a variable you never need to get confused about which stream/file has
which number.
INFILE = FREEFILE
OPEN "TEST.DAT" FOR INPUT AS INFILE
REM Check for EndOfFile(EOF) then
REM read line from input and print it
DO WHILE NOT EOF(INFILE)
LINE INPUT #INFILE, theLine
PRINT theLine
CLOSE #INFILE
Tcl Version
64
close $infile
Things to remember
• Open files before using them
• Files can usually only be read or written but not both at the same time
• Pythons readlines() function reads all the lines in a file, while readline() only
reads one line at a time, which may help save memory.
• Close files after use.
65
Handling Errors
What will we cover?
• 2 ways of dealing with errors
• raising errors in our code for others to catch
In either case its up to the programmer to check to see whether an error has occurred and take appropriate
action.
This can result in production quality programs where over half of the code is taken up with testing every
action for success. This is cumbersome and makes the code hard to read (but in practice it's how the majority
of programs today work). A consistent approach is essential if silly mistakes are to be avoided.
try:
# program logic goes here
except ExceptionType:
# exception processing for named exception goes here
except AnotherType:
# exception processing for a different exception goes here
else:
# here we tidy up if NO exceptions are raised
There is another type of 'exception' block which allows us to tidy up after an error, its called a
try...finally block and typically is used for closing files, flushing buffers to disk etc. The finally
block is always executed last regardless of what happens in the try section.
try:
66
# normal program logic
finally:
# here we tidy up regardless of the
# success/failure of the try block
In this case x doesn't exist so we can't unset it. Tcl raises an exception but the catch prevents the
program from aborting and instead puts the error message into the msg variable and returns a non-zero value
(which can be defined by the programmer). You can then test the return value of catch in errorcode. If it
is non zero then an error occured and you can examine the msg variable.
BASIC doesn't quite support exceptions but does have a construct which helps to keep the code clear:
Note the use of line numbers. This was common in older programming languages including early BASIC.
Now you can do the same thing with labels:
Notice the RESUME NEXT statements which allow us to return to just after the error and carry on with the
program.
Generating Errors
What happens when we want to generate exceptions for other people to catch, in a module say? In that case
we use the raise keyword in Python:
numerator = 42
denominator = input("What value will I divide 42 by?")
if denominator == 0:
raise "zero denominator"
This raises a string object exception which can be caught by a try/except block.
67
Tcl's Error Mechanism
In Tcl the return statement takes an optional -code flag which gets caught by any enclosing catch:
err should have the value 3 and msg the value 7. Once again a case where Tcl's syntax is less intuitive than
it might have been.
ERRORS:
IF ERR = 142 THEN
PRINT "Error 142 found"
STOP
ELSE
PRINT "No error found"
STOP
END IF
Things to remember
• Check error codes using an if statement
• Catch exceptions with an except clause
• Generate exceptions using the raise keyword
• Errors can be a simple string
68
Advanced Topics
Recursion
Note: This is a fairly advanced topic and for most applications you don't need to know anything
about it. Occasionally, it is so useful that it is invaluable, so I present it here for your study. Just
don't panic if it doesn't make sense stright away.
What is it?
Despite what I said earlier about looping being one of the cornerstones of programming it is in fact possible
to create programs without an explicit loop construct. Some languages, such as Lisp, do not in fact have an
explicit loop construct like FOR, WHILE, etc. Instead they use a technique known as recursion . This
turns out to be a very powerful technique for some types of problem, so we'll take a look at it now.
Recursion simply means applying a function as a part of the definition of that same function. Thus the
definition of GNU (the source of much free software) is said to be recursive because GNU stands for 'GNU's
Not Unix'. ie GNU is part of the definition of GNU!
The key to making this work is that there must be a terminating condition such that the function branches
to a non-recursive solution at some point. (The GNU definition fails this test and so gets stuck in an infinite
loop).
Let's look at a simple example. The mathematical factorial function is defined as being the product of all the
numbers up to and including the argument, and the factorial of 1 is 1. Thinking about this, we see that
another way to express this is that the factorial of N is equal to N times the factorial of (N-1).
Thus:
1! = 1
2! = 1 x 2 = 2
3! = 1 x 2 x 3 = 2! x 3 = 6
N! = 1 x 2 x 3 x .... (N-2) x (N-1) x N = (N-1)! x N
def factorial(n):
if n == 1:
return 1
else:
return n * factorial(n-1)
Now because we decrement N each time and we test for N equal to 1 the function must complete.
Writing the factorial function without recursion involves quite a bit more code. You need to create a list of
all the numbers from 1 to N then loop over that list multiplying the current total by the next item. Try it as an
exercise and compare the result to the function above.
Consider the trivial case of printing each element of a list of strings using a function printList:
def printList(L):
69
if L:
print L[0]
# for [1:] - see 'slicing' in the Python reference manual
printList(L[1:])
If L is true - non empty - we print the first element then process the rest of the list.
For a simple list that's a trivial thing using a simple loop. But consider what happens if the List is complex
and contains other lists within it. If we can test whether an item is a List then we can call printList()
recursively. If not we simply print it. Lets try that:
def printList(L):
# if its empty do nothing
if not L: return
# if its a list call printList on 1st element
if type(L[0]) == type([]):
printList(L[0])
else: #no list so just print
print L[0]
# now process the rest of L
printList(L[1:])
Now if you try to do that using a conventional loop construct you'll find it very difficult. Recursion makes a
very complex task comparatively simple.
There is a catch (of course!). Recursion on large data structures tends to eat up memory so if you are short of
memory, or have very large data structures to process the more complex conventional code may be safer.
OK, let's now take another leap into the unknown as we introduce Object Oriented Programming.
70
Object Oriented Programming
What is it?
Now we move onto what might have been termed an advanced topic up until about 5 years ago. Nowadays
'Object Oriented Programming has become the norm. Languages like Java and Python embody the concept
so much that you can do very little without coming across objects somewhere. So what's it all about?
These increase in depth, size and academic exactitude as you go down the list. For most non professional
programmers' purposes the first is adequate. For a more programming focussed intro try Object Oriented
Programming by Timothy Budd(2nd edition). I haven't personally read this one but it gets rave reviews from
people whose opinions I respect. Finally for a whole heap of info on all topics OO try the Web link site at:
http://www.cetus.org
Assuming you don't have the time nor inclination to research all these books and links right now, I'll give
you a brief overview of the concept. (Note:Some people find OO hard to grasp others 'get it' right away.
Don't worry if you come under the former category, you can still use objects even without really 'seeing the
light'.)
One final point, we will only be using Python in this section since neither BASIC not Tcl support objects. It
is possible to implement an Object Oriented design in a non OO language through coding conventions, but
it's usually an option of last resort rather than a recommended strategy. If your problem fits well with OO
techniques then it's best to use an OO language.
For example a string object would store the character string but also provide methods to operate on that
string - search, change case, calculate length etc.
Objects use a message passing metaphor whereby one object passes a message to another object and the
receiving object responds by executing one of its operations, a method. So a method is invoked on receipt of
the corresponding message by the owning object. There are various notations used to represent this but the
most common mimics the access to fields in records - a period. Thus, for a fictitious widget class:
This would cause the paint method of the widget object to be invoked.
71
Defining Classes
Just as data has various types so objects can have different types. These collections of objects with identical
characteristics are collectively known as a class. We can define classes and create instances of them, which
are the actual objects. We can store references to these objects in variables in our programs.
Let's look at a concrete example to see if we can explain it better. We will create a message class that
contains a string - the message text - and a method to print the message.
class Message:
def __init__(self, aString):
self.text = aString
def printIt(self):
print self.text
Note 1:One of the methods of this class is called __init__ and it is a special method called a
constructor. The reason for the name is that it is called when a new object instance is created or constructed.
Any variables assigned (and hence created in Python) inside this method will be unique to the new instance.
There are a number of special methods like this in Python, nearly all distinguished by the __xxx__
naming format.
Note 2:Both the methods defined have a first parameter self. The name is a convention but it indicates the
object instance. As we will see this parameter is filled in by the interpreter at run-time, not by the
programmer. Thus print is called with no arguments: m.print().
Note 3:We called the class Message with a capital 'M'. This is purely convention, but it is fairly widely
used, not just in Python but in other OO languages too. A related convention says that method names should
begin with a lowercase letter and subsequent words in the name begin with uppercase letters. Thus a method
called "calculate current balance" would be written: calculateCurrentBalance.
You may want to briefly revisit the 'Data' section and look again at 'user defined types'. The Python address
example should be a little clearer now. Essentially the only type of used defined type in Python is a class. A
class with attributes but no methods (except __init__ is effectively equivalent to a BASIC record.
Using Classes
Having defined a class we can now create instances of our Message class and manipulate them:
m1 = Message("Hello world")
m2 = Message("So long, it was short but sweet")
So in essence you just treat the class as if it was a standard Python data type, which was after all the purpose
of the excercise!
72
Same thing, Different thing
What we have so far is the ability to define our own types (classes) and create instances of these and assign
them to variables. We can then pass messages to these objects which trigger the methods we have defined.
But there's one last element to this OO stuff, and in many ways it's the most important aspect of all.
If we have two objects of different classes but which support the same set of messages but with their own
corresponding methods then we can collect these objects together and treat them identically in our program
but the objects will behave differently. This ability to behave differently to the same input messages is
known as polymorphism.
Typically this could be used to get a number of different graphics objects to draw themselves on receipt of a
'paint' message. A circle draws a very different shape from a triangle but provided they both have a paint
method we, as programmers, can ignore the difference and just think of them as 'shapes'.
Let's look at an example, where instead of drawing shapes we calculate their areas:
class Square:
def __init__(self, side):
self.side = side
def calculateArea(self):
return self.side**2
class Circle:
def __init__(self, radius):
self.radius = radius
def calculateArea(self):
import math
return math.pi*(self.radius**2)
Now we can create a list of shapes (either circles or squares) and then print out their areas:
list = [Circle(5),Circle(7),Square(9),Circle(3),Square(12)]
Now if we combine these ideas with modules we get a very powerful mechanism for reusing code. Put the
class definitions in a module - say 'shapes.py' and then simply import that module when we want to
manipulate shapes. This is exactly what has been done with many of the standard Python modules, which is
why accessing methods of an object looks a lot like using functions in a module.
Inheritance
Inheritance is often used as a mechanism to implement polymorphism. Indeed in many OO languages it is
the only way to implement polymorphism. It works as follows:
A class can inherit both attributes and operations from a parent or super class. This means that a new class
which is identical to another class in most respects does not need to reimplement all the methods of the
existing class, rather it can inherit those capabilities and then override those that it wants to do differently
(like the paint method in the case above)
Again an example might illustrate this best. We will use a class heirarchy of bank accounts where we can
deposit cash, obtain the balance and make a withdrawal. Some of the accounts provide interest (which, for
our purposes, we'll assume is calculated on every deposit - an interesting innovation to the banking world!)
and others charge fees for withdrawals.
73
The BankAccount class
Let's see how that might look. First let's consider the attributes and operations of a bank account at the most
general (or abstract) level.
Its usually best to consider the operations first then provide attributes as needed to support these operations.
So for a bank account we can:
• Deposit cash,
• Withdraw cash,
• Check current balance and
• Transfer funds to another account.
To support these operations we will need a bank account ID(for the transfer operation) and the current
balance.
class BankAccount:
def __init__(self, initialAmount):
self.balance = initialAmount
print "Account created with balance %5.2f" % self.balance
def checkBalance(self):
return self.balance
Note 1: We check the balance before withdrawing and also the use of exceptions to handle errors. Of course
there is no error type BalanceError so we needed to create one - it's simply a string varable!
Note 2: The transfer method uses the BankAccount's withdraw/deposit member functions or
methods to do the transfer. This is very common in OO and is known as self messaging. It means that
derived classes can implement their own versions of deposit/withdraw but the transfer method
can remain the same for all account types.
74
The InterestAccount class
Now we use inheritance to provide an account that adds interest (we'll assume 3%) on every deposit. It will
be identical to the standard BankAccount class except for the deposit method. So we simply overrride that:
class InterestAccount(BankAccount):
def deposit(self, amount):
BankAccount.deposit(self,amount)
self.balance = self.balance * 1.03
And that's it. We begin to see the power of OOP, all the other methods have been inherited from
BankAccount (by putting BankAccount inside the parentheses after the new class name). Notice also that
deposit called the superclass's deposit method rather than copying the code. Now if we modify the
BankAccount deposit to include some kind of error checking the sub-class will gain those changes
automatically.
This account is again identical to a standard BankAccount class except that this time it charges $3 for every
withdrawal. As for the InterestAccount we can create a class inheriting from BankAccount and modifying
the withdraw method.
class ChargingAccount(BankAccount):
def __init__(self, initialAmount):
BankAccount.__init__(self, initialAmount)
self.fee = 3
Note 1: We store the fee as an instance variable so that we can change it later if necessary. Notice that we
can call the inherited __init__ just like any other method.
Note 2: We simply add the fee to the requested withdrawal and call the BankAccount withdraw method to
do the real work.
Note 3: We introduce a side effect here in that a charge is automatically levied on transfers too, but that's
probably what we want, so is OK.
To check that it all works try executing the following piece of code (either at the Python prompt or by
creating a separate test file).
# Now an InterestAccount
75
c = InterestAccount(1000)
c.deposit(100)
print "C = ", c.checkBalance()
# Then a ChargingAccount
d = ChargingAccount(300)
d.deposit(200)
print "D = ", d.checkBalance()
d.withdraw(50)
print "D = ", d.checkBalance()
d.transfer(100,a)
print "A = ", a.checkBalance()
print "D = ", d.checkBalance()
That's it. A reasonably straightforward example but it shows how inheritance can be used to quickly extend a
basic framework with powerful new features.
We've seen how we can build up the example in stages and how we can put together a test program to check
it works. Our tests were not complete in that we didn't cover every case and there are more checks we could
have included - like what to do if an account is created with a negative amount...
But hopefully this has given you a taste of Object Oriented Programming and you can move on to some of
the other online tutorials, or read one of the books mentioned at the beginning for more information and
examples.
76
Namespaces
Introduction
What's a namespace? I hear you ask. Well, it's kinda hard to explain. Not because they are especially
complicated, but because every language does them differently. The concept is pretty straightforward, a
namespace is a space or region, within a program, where a name (variable, class etc) is valid.
They came about because early programming languages (like BASIC) only had Global Variables, that is,
ones which could be seen throughout the program - even inside functions. This made maintenance of large
programs difficult since it was easy for one bit of a program to modify a variable without other parts of the
program realizing it - this was called a side-effect. To get round this, later languages (including modern
BASICs) introduced the concept of namespaces. (C++ has taken this to extremes by allowing the
programmer to create their own namespaces anywhere within a program. This is useful for library creators
who might want to keep their function names unique when mixed with libraries provided by another
supplier)
Python's approach
In Python every module creates it's own namespace. To access those names we have to either precede them
with the name of the moduleor explicitly import the names we want to use into our modules namespace.
Nothing new there, we've been doing it with the sys and string modules already. In a sense a class
definition also creates its own namespace. Thus, to access a method or property of a class, we need to use the
name of the instance variable or the classname first.
So far so good. Now how does this come together when variables in different namespaces have the same
name? Or when a name not in the current namespace is referenced? Looking at the former situation first: If a
function refers to a variable called X and there exists an X within the function (local scope) then that is the
one that will be seen and used by Python. It's the programmers job to avoid name clashes such that a local
variable and module variable of the same name are not both required in the same function - the local variable
will mask the global.
In general you should minimize the use of 'global' statements, it's usually better to pass the variable in as a
parameter and then return the modified variable.
The second point, where a name which is not within the current local namespace is referenced, is resolved as
follows: The function will look wihin its local namespace, if it can't find it there it will then look at the
module scope and if not there then at the builtin scope. The only snag with this is when we want to assign a
value to the external variable. Normally this would create a new variable name, but we don't want that to
happen so once more we must specify it as global to ensure we don't create a local version of the name.
We can see all of this at work in this example (which is purely about illustrating the point!):
77
#tell function to look at module level and not create its own W
global W
if Z > W:
# print is a 'builtin-scope' name
print "2 x X is greater than X + 5"
return Z
else:
return Y # no local Y so uses module version
Now when we import a module such as sys we make the name sys available locally and then we can
access names within the sys module namespace by qualifying the name as we've seen. If we do:
we only bring the exit function into the local namespace. We cannot use any other sys names, not even
sys itself.
Tcl
So far as I can tell there are no namespace controls in Tcl. Possibly because of the unique way Tcl parses the
program. In any case it seems that all variables are local to their immediate surroundings - file level variables
are only visible to commands within the file and procedure variables are only visible within the procedure.
To communicate between the two namespaces you must pass the values in as parameters.
78
Event Driven Programming
So far we have been looking at batch oriented programs. Recall that programs can be batch oriented,
whereby they start, do something then stop, or event driven where they start, wait for events and only stop
when told to do so - by an event. How do we create an event driven program? We'll look at this in two ways
- first we will simulate an event environment then we'll create a very simple GUI program that uses the
operating system and environment to generate events.
We will create a program that looks for precisely one type of event - keyboard input - and processes the
results until some quit event is received. In our case the quit event will be the space key. We will process the
incoming events in a very simple manner - we will simply print the ASCII code for that key. We'll use
BASIC for this because it has a nice,. easy to use function for reading keys one at a time - INKEY$.
First we implement the main program body which simply starts up the event gathering loop and calls the
event handling subroutines when a valid event is detected.
' First, clear the screen of clutter then warn the user
' of what to do to quit
CLS
PRINT "Hit space to end..."
PRINT
Notice that what we do with the events is of no interest to the main body, it simply collects the events and
passes them to the event handlers. This independance of event capture and processing is a key feature of
event driven programming.
Now we can implement the 2 event handlers. The first, doKeyEvent simply prints out the ASCII value
of the key pressed:
79
IF length = 1 THEN 'its simple ASCII
PRINT ASC(keypress)
ELSE
IF length = 2 THEN
'its non alphanumeric so use the 2nd char
PRINT ASC(MID$(keypress, 2, 1))
END IF
END IF
END SUB
If we were creating this as a framework for use in lots of projects we would probably include a call to an
initialisation function at the start and a cleanup function at the end. The programmer could then use the loop
part and provide his own initialisation, processing and cleanup functions.
That's exactly what most GUI type environments do, in that the loop part is embedded in the operating
environment or framework and applications are contractually required to provide the event handling
functions and hook these into the event loop in some way.
A GUI program
For this exercise we'll use the Python Tkinter toolkit. This is a Python wrapper around the Tk toolkit
originally written as an extension to Tcl and also available for Perl. The Python version is an object oriented
framework which is, in my opinion, considerably easier to work with than the original procedural Tk
version. I am not going to dwell much on the GUI aspects of this, rather I want to focus on the style of
programming - using Tkinter to handle the event loop and leaving the programmer to create the initial GUI
and then process the events as they arrive.
In the example we create an application class KeysApp which creates the GUI in the __init__ method
and binds the space key to the doQuitEvent method. The class also defines the required
doQuitEvent method.
The GUI itself simply consists of a text entry widget whose default behaviour is to echo characters typed
onto the display.
Creating an application class is quite common in OO event driven environments because there is a lot of
synergy between the concepts of events being sent to a program and messages being sent to an object. The
two concepts map on to each other very easily. An event handling function thus becomes a method of the
application class.
Having defined the class we simply create an instance of it and then send it the mainloop message.
80
class KeysApp(Frame):
def __init__(self):
Frame.__init__(self)
self.txtBox = Text(self)
self.txtBox.bind("<space>", self.doQuitEvent)
self.txtBox.pack()
self.pack()
def doQuitEvent(self,event):
import sys
sys.exit()
Of course in the BASIC version we printed the ASCII codes of all keys rather than only printing the
alphanumeric versions of printable keys as we do here. There's nothing to prevent us capturing all of the
keypresses and doing the same thing. To do so we would add the following line to the __init__ method:
self.txtBox.bind("<Key>", self.doKeyEvent)
def doKeyEvent(self,event):
str = "%d\n" % event.keycode
self.txtBox.insert(END, str)
return "break"
Note 1: the key value is stored in the keycode field of the event. I had to look at the source code of
Tkinter.py to find that out... Recall that curiosity is a key attribute of a programmer?
Note 2: return "break" is a magic signal to tell Tkinter not to invoke the default event processing for
that widget. Without that line, the text box displays the ASCII code followed by the actual character typed,
which is not what we want here.
That's enough for now. This isn't meant to be a tutorial on Tkinter, thats the subject of the next topic. There
are also several books on using TK and Tkinter.
81
GUI Programming with Tkinter
In this topic we look at how a GUI program is assembled in a general sense, then how this is done using
Python's native GUI toolkit, Tkinter. This will not be a full blown Tkinter refence nor even a complete
tutorial. There is already a very good and detailed tutor linked from the Python web site. This tutorial will
instead try to lead you through the basics of GUI programming, introducing some of the basic GUI
components and how to use them. We will also look at how Object Oriented programming can help organise
a GUI application.
GUI principles
The first thing I want to say is that you won't learn anything new about programming here. Programming a
GUI is exactly like any other kind of programming, you can use sequences, loops, branches and modules just
as before. What is different is that in programming a GUI you usually use a Toolkit and must follow the
pattern of program design laid down by the toolkit vendor. Each new toolkit will have its own API and set of
design rules and you as a programmer need to learn these. This is why most programmers try to standardise
on only a few toolkits which are available across multiple languages - learning a new toolkit tends to be
much harder than learning a new programming language!
We are going to look at the Tk toolkit which is used in Tcl, Perl and Python. The principles in Tk are slightly
different to other toolkits so I will conclude with a very brief look at another popular GUI toolkit for
Python(and C/C++) which is more conventional in its approach. But first some general principles:
As we have already stated several times GUI applications are nearly always event driven by nature. If you
don't remember what that means go back and look at the event driven programming topic.
I will assume that you are already familiar with GUIs as a user and will focus on how GUI programs work
from a programmers perspective. I will not be going into details of how to write large complex GUIS with
multiple windows, MDI interfaces etc. I will stick to the basics of creating a single window application with
some labels, buttons, text boxes and message boxes.
First things first, we need to check our vocabulary. GUI programming has its own set of programming terms.
The most common terms are described in the table below:
Term Description
Window An area of the screen controlled by an application. Windows are usually
rectangular but some GUI environments permit other shapes. Windows can
contain other windows and frequently every single GUI control is treated as
a window in its own right.
Control A control is a GUI object used for controlling the application. Controls have
properties and usually generate events. Normally controls correspond to
application level objects and the events are coupled to methods of the
corresponding object such that when an event occurs the object executes one
of its methods. The GUI environment usually provides a mechanism for
binding events to methods.
Widget A control, sometimes restricted to visible controls. Some controls(such as
timers) can be associated with a given window but are not visible. Widgets
are that subset of controls which are visible and can be maniplulated by the
user or programmer. The widgets that we shall cover are:
• Frame
• Label
• Button
82
• Text Entry
• Message boxes
The ones we won't discuss in this topic but are used elsewhere in the tutor are:
• Text box
• Radio Button
Frame A type of widget used to group other widgets together. Often a Frame is
used to represent the complete window and further frames are embedded
within it.
Layout Controls are laid out within a Frame according to a particular form of
Layout. The Layout may be specified in a number of ways, either using on-
screen coordinates specified in pixels, using relative position to other
components(left, top etc) or using a grid or table arrangement. A coordinate
system is easy to understand but difficult to manage when a window is
resized etc. Beginners are advised to use non-resizable windows if working
with coordinate based layouts.
Child GUI applications tend to consist of a heirarchy of widgets/controls. The top
level Frame comprising the application window will contain sub frames
which in turn contain still more frames or controls. These controls can be
visualised as a tree structure with each control having a single parent and a
number of children. In fact it is normal for this structure to be strored
explicitly by the widgets so that the programmer, or more commonly the
GUI environment itself, can often perform some common action to a control
and all its children.
This is the first requirement of any Tkinter program - import the names of the widgets. You could of course
just import the module but it quickly gets tiring typing Tkinter in front of every component name.
83
>>> top = Tk()
This creates the top level widget in our widget heirarchy. All other widgets will be created as children of
this. Notice that a new blank window has appeared complete with an empty title bar and the usual set of
control buttons (iconify, maximise etc). We will now add components to this window as we build an
application.
>>> dir(top)
The dir function shows us what names are known to the argument. You can use it on modules but in this
case we are looking at the internals of the top object, an instance of the Tk class. These are the attributes of
top, note, in particular, the children and parent attributes which are the links to the widget
heirarchy. Note also the attribute _tclCommands, this is because, as you might recall, Tkinter is built on
a Tcl toolkit called Tk.
>>> F = Frame(top)
Create a Frame widget which will in turn contain the child controls/widgets that we use. Frame specifies
top as its first(and in this case only) parameter thus signifying that F will be a child widget of top.
>>>F.pack()
Notice that the Tk window has now shrunk to the size of the added Frame widget - which is currently empty
so the window is now very small! The pack() method invokes a Layout Manager known as the packer
which is very easy to use for simple layouts but becomes a little clumsy as the layouts get more complex.
We will stick with it for now because its quite easy to use. Note that widgets will not be visible in our
application until we pack them(or use another Layout manager method)
Here we create a new object lHello, an instance of the Label class, with a parent widget F and a text
attribute of "Hello world". Notice that because Tkinter object constructors tend to have many parameters
(each with default values) it is usual to use the named parameter technique of passing arguments to Tkinter
objects. Also notice that the object is not yet visible because we haven't packed it yet.
One final point to note is the use of a naming convention: I put a lowercasel, for Label, in front of a name,
Hello, which reminds me of its purpose. Like most naming conventions this is a matter of personal choice,
but I find it helps.
>>>lHello.pack()
Now we can see it. Hopefully yours looks quite a lot like this:
We can specify other properties of the Label such as the font and color using parameters to the object
constructor too. We can also access the corresponding properties using the configure method of Tkinter
widgets, like so:
>>> lHello.configure(text="Goodbye")
84
The message changed. That was easy, wasn't it? configure is an especially good technique if you need to
change multiple properties at once because they can all be passed as arguments. However if you only want to
change a single property at a time, as we did above you can treat the object like a dictionary, thus:
Labels are pretty boring widgets, they can really only display read-only text, albeit in various colors, fonts
and sizes. (In fact they can be used to display simple graphics too but I'll show you how to do that later).
Before we look at another object type there is one more thing to do and that's to set the title of the window.
We do that by using a method of the top level widget top:
>>> F.master.title("Hello")
We could have used top directly but as we'll see later access through the Frame's master property is a useful
technique.
Here we create a new widget a button. The button has a label "Quit" and is associated with the command
F.quit. Note that we pass the method name, we do not call the method by adding parentheses after it. This
means we must pass a function object in Python terms, it can be a built in method provided by Tkinter, as
here, or any other function that we define. The function or method must take no arguments. The quit
method, like the pack method, is defined in a base class and is inherited by all Tkinter widgets.
>>>bQuit.pack()
>>>top.mainloop()
We start the Tkinter event loop. Notice that the Python >>> prompt has now disappeared. That tells us that
Tkinter now has control. If you press the Quit button the prompt will return, proving that our command
option worked.
Note that if running this from Pythonwin or IDLE you may get a different result, if so try typing the
commands so far into a Python script and running them from an OS command prompt.
In fact its probably a good time to try that anyhow, after all its how most Tkinter programs will be run in
practice. Use the key commands from those we've discussed so far as shown:
85
top.mainloop()
The call to the top.mainloop method starts the Tkinter event loop generating events. In this case the
only event caught will be the button press event which is connected to the F.quit method. F.quit in
turn will terminate the application. Try it, it should look like this:
Exploring Layout
Note: Moving away from the >>> prompt, from now on I'll provide examples within Python script files.
In this section I want to look at how Tkinter positions widgets within a window. We already have seen
Frame, Label and Button widgets and those are all we need for this section. In the previous example we used
the pack method of the widget to locate it within its parent widget. Technically what we are doing is
invoking Tk's packer Layout Manager. The Layout Manager's job is to determine the best layout for the
widgets based on hints that the programmer provides plus constraints such as the size of the window as
controlled by the user. Some Layout managers use exact locations within the window, specified in pixels
normally, and this is very common in Microsoft Windows environments such as Visual Basic. Tkinter
includes a Placer Layout Manager which can do this too via a place method. I won't look at that in this tutor
because usually one of the other, more intelligent managers is a better choice since they take the need to
worry about what happens when a window is resized away from us as programmers.
The simplest Layout Manager in Tkinter is the packer which we've been using. The packer, by defauilt, just
stacks widgets one on top of the other. That is very rarely what we want for normal widgets, but if we build
our applications from Frames then stacking Frames on top of each other is quite a reasonable approach. We
can then put out other widgets into the Frames using either the packer or other Layout Manager within each
Frame as appropriate. You can see an example of this in action in the Case Study topic.
Even the simple packer provides a multitude of options, however. For example we can arrange our widgets
horizontally instead of vertically by providing a side argument, like so:
lHello.pack(side="left")
bQuit.pack(side="left")
That will force the widgets to go to the left thus the first widget (the label) will appear at the extreme left
hand side, followed by the next widget (the Button). If you modify the lines in the example above it will look
like this:
And if you change the "left" to "right" then the Label appears on the extreme right and the Button to
the left of it, like so:
One thing you notice is that it doesn't look very nice because the widgets are squashed together. The packer
also provides us with some parameters to deal with that. The easiest to use is Padding and is specified in
86
terms of horizontal padding (padx), and vertical padding(pady). These values are specified in pixels. Lets
try adding some horizontal padding to our example:
lHello.pack(side="left", padx=10)
bQuit.pack(side='left', padx=10)
If you try resizing the window you'll see that the widgets retain their positions relative to one another but
stay centered in the window. Why is that, if we packed them to the left? The answer is that we packed them
into a Frame but the Frame was packed without a side, so it is positioned top, centre - the packers default. If
you want the widgets to stay at the correct side of the window you will need to pack the Frame to the
appropriate side too:
F.pack(side='left')
Also note that the widgets stay centred if you resize the window vertically - again that's the packers default
behaviour.
I'll leave you to play with padx and pady for yourself to see the effect of different values and combinations
etc. Between them, side and padx/pady allow quite a lot of flexibility in the positioning of widgets
using the packer. There are several other options, each adding another subtle form of control, please check
the Tkinter reference pages for details.
There are a couple of other layout managers in Tkinter, known as the grid and the placer. To use the grid
manager you use grid() instead of pack() and for the placer you call place() instead of pack().
Each has its own set of options and since I'll only cover the packer in this intro you'll need to look up the
Tkinter tutorial and reference for the details. The main points to note are that the grid arranges components
in a grid (surprise!) within the window - this can often be useful for dialog boxes with lined up text entry
boxes, for example. The placer user either fixed coordinates in pixels or relative coordinates within a
window. The latter allow the component to resize along with the window - always occupying 75% of the
vertical space say. This can be useful for intricate window designs but does require a lot of pre planning - I
strongly recommend a pad of squared paper, a pencil and eraser!
F = Frame(top,relief="sunken", border=1)
Note 1:You need to provide a border too. If you don't the Frame will be sunken but with an invisible border
- you don't see any difference!
Note 2: that you don't put the border size in quotes. This is one of the confusing aspects of Tk programming
is knowing when to use quotes around an option and when to leave them out. In general if it's a numeric or
single character value you can leave the quotes off. If it's a mixture of digits and letters or a string then you
need the quotes. Likewise with which letter case to use. Unfortunately there is no easy solution, you just
learn from experience - Python often gives a list of the valid options in it's error messages!
87
One other thing to notice is that the Frame doesn't fill the window. We can fix that with another packer
option called, unsurprisingly, fill. When you pack the frame do it thusly:
F.pack(fill="x")
This fills horizontally, if you want the frame to fill the entire window just use fill='y' too.
Going back to our "Hello World" program we'll add a text entry widget inside a Frame of its own and a
button that can clear the text that we type into it. This will demonstrate not only how to create and use the
Entry widget but also how to define our own event handling functions and connect them to widgets.
Note that once more we pass the name of the event handler (evClear) as the command argument to the
bClear button. Note also the use of a naming convention, evXXX to link the event handler with the
corresponding widget.
88
Running the program yields this:
And if you type something in the text entry box then hit the "Clear Text" button it removes it again.
We'll now define a hot key - let's say CTRL-c - to delete the text in the above example. To do that we need
to bind the CTRL-C key combination to the same event handler as the Clear button. Unfortunately there's an
unexpected snag. When we use the command option the functuion specified must take no arguments. When
we use the bind function to do the same job the bound function must take one argument. This we need to
create a new function with a single parameter which calls evClear. Add the following after the evClear
definition:
def evHotKey(event):
evClear()
And add the following line following the definition of the Entry widget:
Run the program again and you can now clear the text by either hitting the button or typing Ctrl-c. We could
also use bind to capture things like mouse clicks or capturing or losing Focus or even the windows becoming
visible. See the Tkinter documentation for more information on this. The hardest part is usually figuring out
the format of the event description!
A Short Message
You can report short messages to your users using a MessageBox. This is very easy in Tk and is
accomplished using the tkMessageBox module functions as shown:
import tkMessageBox
tkMessageBox.showinfo("Window Title", "A short message")
There are also error, warning, Yes/No anfd OK/Cancel boxes available via different
showXXX functions. They are distinguished by different icons and buttons. The latter
two use askXXX instead of showXXX and return a value to indicate which button the
user pressed, like so:
res = tMessageBox.askokcancel("Which?", "Ready to stop?")
print res
89
The Tcl view
Since we've been comparing Python with Tcl throughout the early part of this tutor it seems sensible to show
you what the early Label and Button example looks like in the original Tcl/Tk form:
As you can see it is very concise. The widget heirarchy is formed by a naming convention with '.' as the top
level widget. As is usual in Tcl the widgets are commands with the properties passed as arguments.
Hopefully the translation of widget parameters to Python named arguments is fairly obvious. This means that
you can use the Tcl/Tk documentation(of which there is a lot!) to help solve problems with Tkinter
programming, mostly its an obvious translation.
That's as far as I'm going with Tcl/Tk here. Before we finish though I'll show you a common technique for
bundling Tkinter GUI applications as objects.
I will convert the example above using an Entry field, a Clear button and a Quit button to an OO structure.
First we create an Application class and within the constructor assemble the visual parts of the GUI.
We assign the resultant Frame to self.mainWindow, thus allowing other methods of the class access to
the top level Frame. Other widgets that we may need to access (such as the Entry field) are likewise assigned
to member variables of the Frame. Using this technique the event handlers become methods of the
application class and all have access to any other data members of the application (although in this case there
are none) through the self reference. This provides seamless integration of the GUI with the underlying
application objects:
90
from Tkinter import *
class ClearApp:
def __init__(self, parent=0):
self.mainWindow = Frame(parent)
# Create the entry widget
self.entry = Entry(self.mainWindow)
self.entry.insert(0,"Hello world")
self.entry.pack(fill=X)
def clearText(self):
self.entry.delete(0,END)
app = ClearApp()
app.mainWindow.mainloop()
The result looks remarkably like the previous incarnation although I have tweaked the lower frame to give it
a nice grooved finish and I've supplied widths to the buttons to make them look more similar to the
wxPython example below.
Of course its not just the main application that we can wrap up as an object. We could create a class based
around a Frame containing a standard set of buttons and reuse that class in building dialog windows say. We
could even create whole dialogs and use them across several projects. Or we can extend the capabilities of
the standard widgets by subclassing them - maybe to create a button that changes colour depending on its
state. This is what has been done with the Python Mega Widgets (PMW) which is an extension to Tkinter
which you can download.
91
An alternative - wxPython
There are many other GUI toolkits available but one of the most popular is the wxPython toolkit which is, in
turn, a wrapper for the C++ toolkit wxWindows. wxPython is much more typical than Tkinter of GUI
toolkits in general. It also provides more standard functionality than Tk "out of the box" - things like tooltips,
status bars etc which have to be hand crafted in Tkinter. We'll use wxWindows to recreate the simple "Hello
World" Label and Button example above.
I won't go through this in detail, if you do want to know more about how wxPython works you will need to
download the package from the wxPython website.
In general terms the toolkit defines a framework which allows us to create windows and populate them with
controls and to bind methods to those controls. It is fully object oriented so you should use methods rather
than functions. The example looks like this:
# --- Define a custom Frame, this will become the main window ---
class HelloFrame(wxFrame):
def __init__(self, parent, ID, title, pos, size):
wxFrame.__init__(self, parent, ID, title, pos, size)
# we need a panel to get the right background
panel = wxPanel(self, -1)
92
Points to note are the use of a naming convention for the methods that get called by the framework -
OnXXXX. Also note the EVT_XXX functions to bind events to widgets - there is a whole family of these.
wxPython has a vast array of widgets, far more than Tkinter, and with them you can build quite sophisticated
GUIs. Unfortunately they tend to use a coordinate based placement scheme which becomes very tedious
after a while. It is possible to use a scheme very similar to the Tkinter packer but its not so well documented.
There is a commercial GUI builder available and hopefully someone will soon provide a free one too.
Incidentally it might be of interest to note that this and the very similar Tkinter example above have both got
exactly the same number of lines of executable code - 21.
In conclusion, if you just want a quick GUI front end to a text based tool then Tkinter should meet your
needs with minimal effort. If you want to build full featured cross platform GUI applications look more
closely at wxPython.
Other toolkits include MFC, pyQt, pyGTK, the latter is Linux only at present although potentially could be
ported to Windows too since the underlying GTK library already runs on Windows. Finally there is curses
which is a kind of text based GUI! Many of the lessons we've learned with Tkinter apply to all of these
toolkits but each has its own characteristics and foibles. Pick one, get to know it and enjoy the wacky world
of GUI design.
That's enough for now. This wasn't meant to be a Tkinter reference page, just enough to get you started. See
the Tkinter section of the Python web pages for links to other Tkinter resources.
There are also several books on using Tcl/Tk and at least one on Tkinter. I will however come back to
Tkinter in the case study, where I illustrate one way of encapsulating a batch mode program in a GUI for
improved usability.
93
Functional Programming
In this topic we look at how Python can support yet another programming style: Functional
Programming(FP). As with Recursion this is a genuinely advanced topic which you may wish to ignore for
the present. Functional techniques do have some uses in day to day programming and the supporters of FP
believe it to be a fundamentally better way to develop software.
Functional programming is all about expressions. In fact another way to describe FP might be to term it
expression oriented programming since in FP everything reduces to an expression. You should recall that an
expression is a collection of operations and variables that reults in a single value. Thus x == 5 is a
boolean expression. 5 + (7-Y) is an arithmetic expression. And string.search("Hello
world", "Hell") is a string expression. The latter is also a function call within the string module
and, as we shall see, functions are very important in FP (You might already have guessed that from the
name!).
Functions are used as objects in FP. That is they are often passed around within a program in much the same
way as other variables. We have seen examples of this in our GUI programs where we assigned the name of
a function to the command attribute of a Button control. We treated the event handler function as an object
and assigned a reference to the function to the Button. This idea of passing functions around our program is
key to FP.
Finally FP tries to focus on the what rather than the how of problem solving. That is a functional program
should describe the problem to be solved rather than focus on the mechanism of solution. There are several
programming languages which aim to work in this way, one of the most widely used is Haskell and the
Haskell web site ( www.haskell.org) has numerous papers describing the philosophy of FP as well as the
Haskell language. (My personal opinion is that this goal, however laudable, is rather overstated by FP's
advocates.)
A pure functional program is structured by defining an expression which captures the intent of the program.
Each term of the exression is in turn a statement of a characteristic of the problem (maybe encapsulated as
another expression) and the evaluation of each of these terms eventually yields a solution.
Well, that's the theory. Does it work? Yes, sometimes it works very well. For some types of problem it is a
natural and powerful technique. Unfortunately for many other problems it requires a fairly abstract thinking
style, heavily influenced by mathematical principles. The resultant code is often far from readable to the
layman programmer. The resultant code is also very often much shorter than the equivalent imperative code
and more reliable. It is these latter qualities that have drawn many conventional imperative or object oriented
programmers to investigate FP. Even if not embraced whole heartedly there are several powerful tools that
can be used by all.
We will look at some of the the functions provided and see how they operate on some sample data structures
that we define as:
94
spam = ['pork','ham','spices']
numbers = [1,2,3,4,5]
def eggs(item):
return item
map(aFunction, aSequence)
This function applies a Python function, aFunction to each member of aSequence. The expression:
L = map(eggs, spam)
print L
Results in a new list (in this case identical to spam) being returned in L.
for i in spam:
L.append(i)
print L
Notice however, that the map function allows us to remove the need for a nested block of code. From one
point of view that reduces the complexity of the program by one level. We'll see that as a recurring theme of
FP, that use of the FP functions reduces the relative complexity of the code by eliminating blocks.
filter(aFunction, aSequence)
As the name suggests filter extracts each element in the sequence for which the function returns true.
Consider our list of numbers. If we want to create a new list of only odd nuimbers we can produce it like so:
Again notice that the conventional code requires two levels of indentation to achieve the same result. Again
the increased indentation is an indication of increased code complexity.
reduce(aFunction, aSequence)
The reduce function is a little less obvious in its intent. This function reduces a list to a single value by
combining elements via a supplied function. For example we could sum the values of a list and return the
total like this:
L = [] # empty list
95
res = 0
for i in range(len(numbers)): # use indexing
res = res + numbers[i]
print res
Whilst that produces the same result in this case it is not always so straightforward. What reduce actually
does is call the supplied function passing the first two members of the sequence and replaces the second item
with the result. In other words a more accurate representation of reduce is like this:
Once more we see the FP technique reducing the complexity of the code by avoiding the need for an
indented block of code.
lambda
One feature you may have noticed in the examples so far is that the functions passed to the FP functions tend
to be very short, often only a single line of code. To save the effort of defining lots of very small functions
Python provides another aid to FP - lambda
Lambda is a term used to refer to an anonymous function, that is a block of code which can be executed as if
it were a function but without a name. Lambdas can be defined anywhere within a program that a legal
Python expression can occur, which means we can use them inside our FP functions.
Similarly we can rewrite our map and filter examples like so:
L = map(lambda i: i, spam)
print L
L = filter(lambda i: (i%2 != 0), numbers)
print L
Other constructs
Of course while these functions are useful in their own right they are not suffiecient to allow a full FP style
within Python. The control structures of the language also need to be altered, or at least substituted, by an FP
approach. One way to achieve this is by applying a side effect of how Python evaluates boolean expressions.
Because Python uses short circuit evaluation of boolean expressions certain properties of these expressions
can be exploited. To recap on short-circuit evaluation: when a boolean expression is evaluated the evaluation
96
starts at the left hand expression and proceeds to the right, stopping when it is no longer necessary to
evaluate any further to determine the final outcome.
Taking some specific examples let's see how short circuit evaluation works:
First we define two functions that tell us when they are beuing executed and return the value of their names.
Now we use these to explore how bopolean expressions are evaluated:
Notice that only IF the first part of an AND expression is TRUE then and only then will the second part be
evaluated. If the first part is False then the second part will not be evaluated since the expressionas a whole
cannot be true.
Likewise in an OR based expression if the first part is True then the second part need not be evaluated since
the whole must be true.
We can use these properties to reproduce branching like behaviour. For example suppose we have a piece of
code like the following:
97
Try working through that example and then substitute the call to TRUE() with a call to FALSE(). Thus by
using short circuit evaluation of boolean expressions we have found a way to eliminate conventional if/else
statements from our programs. You may recall that in the recursion topic we observed that recursion could
be used to replace the loop construct. Thus combining these to effects can remove all conventional control
structures from our program, replacing them with pure expressions. This is a big step towards enabling pure
FP style solutions.
Conclusions
At this point you may be wondering what exactly is the point of all of this? You would not be alone.
Although FP appeals to many Computer Science academics (and often to mathemeticians) most practicing
programmers seem to use FP techniques sparingly and in a kind of hybrid fashion mixing it with more
traditional imperative styles as they feel appropriate.
When you have to apply operations to elements in a list such that map, reduce or filter seem the
natural way to express the solution them by all means use them. Just occasionally you may even find that
recursion is more appropriate than a conventional loop. Even more rarely will you find a use for short circuit
evaluation rather than conventions if/else - particularly if required within an expression. As with any
programming tool, don't get carried away with the philosophy, rather use whichever tool is most appropriate
to the task in hand. At least you know that alternatives exist!
There is one final point to make about lambda. There is one area outside the scope of FP that lambda finds
a real use and that's for defining event handlers in GUI programming. Event handlers are often very short
functions, or maybe they simply call some larger function with a few hard wired argument values. In either
case a lambda function can be used as the event handler which avoids the need to define lots of small
individual functions and fill up the namespace with names that would only be used once. Remember that a
lamda statement returns a function object. This function object is the one passed to the widget and is called
at the time the event occurs. If you recall how we define a Button widget in Tkinter, then a lambda would
appear like this:
Of course in this case we could have done the same thing by just assigning a default parameter value to
write() and assigning write to the command value of the Button. However even here using the
lambda form gives us the advantage that the single write() function can now be used for multple
buttons just by passing a different string from the lambda. Thus we can add a second button:
We can also employ lambda when using the bind technique, which sends an event object as an argument:
Well, that really is that for Functional Programming. There are lots of other resources if you want to look
deeper into it, some are listed below.
98
Other resources
If anyone else finds a good refernce drop me an email via the link below.
99
Conclusions
A Case Study
For this case study we are going to expand on the word counting program we
developed earlier. We are going to create a program which mimics the Unix wc
program in that it outputs the number of lines, words and characters in a file. We will
go further than that however and also output the number of sentences, clauses, words,
letters and punctuation characters in a text file. We will follow the development of
this program stage by stage gradually increasing its capability then moving it into a
module to make it reusable and finally turning it into an OO implementation for
maximum extendability.
It will be a Python implementation but at least the initial stages could be written in BASIC or Tcl instead. As
we move to the more complex parts we will make increasing use of Python's built in data structures and
therefore the difficulty in using BASIC will increase, although Tcl will still be an option. Finally the OO
aspects will only apply to Python.
Additional features that could be implemented but will be left as excercises for the reader are:
inp = open("menu.txt","r")
total = 0
inp.close()
We need to add a line and character count. The line count is easy since we loop over
each line we just need a variable to increment on each iteration of the loop. The
character count is only marginally harder since we can iterate over the list of words
adding their lengths in yet another variable.
We also need to make the program more general purpose by reading the name of the file from the command
line or if not provided, prompting the user for the name. (An alternative strategy would be to read from
standard input, which is what the real wc does.)
100
So the final wc looks like:
# Get the file name either from the commandline or the user
if len(sys.argv) != 2:
name = raw_input("Enter the file name: ")
else:
name = sys.argv[1]
inp = open(name,"r")
If you are familiar with the Unix wc command you know that you can pass it a wild-carded filename to get
stats for all matching files as well as a grand total. This program only caters for straight filenames. If you
want to extend it to cater for wild cards take a look at the glob module and build a list of names then simply
iterate over the file list. You'll need temporary counters for each file then cumulative counters for the grand
totals. Or you could use a dictionary instead...
When I started to think about how we could extend this to count sentences and words
rather than 'character groups' as above, my initial idea was to first loop through the
file extracting the lines into a list then loop through each line extracting the words into
another list. Finally to process each 'word' to remove extraneous characters.
Thinking about it a little further it becomes evident that if we simply collect the words and punctuation
characters we can analyse the latter to count sentences, clauses etc. (by defining what we consider a
sentence/clause in terms of punctuation items). This means we only need to interate over the file once and
then iterate over the punctuation - a much smaller list. Let's try sketching that in pseudo-code:
101
sentence count = sum of('.', '?', '!')
clause count = sum of all punctuation (very poor definition...)
############################
# initialise global variables
para_count = 1
line_count, sentence_count, clause_count, word_count = 0,0,0,0
groups = []
punctuation_counts = {}
alphas = string.letters + string.digits
stop_tokens = ['.','?','!']
punctuation_chars = ['&','(',')','-',';',':',','] + stop_tokens
for c in punctuation_chars:
punctuation_counts[c] = 0
format = """%s contains:
%d paragraphs, %d lines and %d sentences.
These in turn contain %d clauses and a total of %d words."""
############################
# Now define the functions that do the work
def getCharGroups(infile):
pass
def getPunctuation(wordList):
pass
102
def reportStats():
print format % (sys.argv[1],para_count,
line_count, sentence_count,
clause_count, word_count)
def Analyze(infile):
getCharGroups(infile)
getPunctuation(groups)
reportStats()
if __name__ == "__main__":
if len(sys.argv) != 2:
print "Usage: python grammer.py <filename >"
sys.exit()
else:
Document = open(sys.argv[1],"r")
Analyze(Document)
Rather than trying to show the whole thing in one long listing I'll discuss this skeleton
then we will look at each of the 3 significant functions in turn. To make the program
work you will need to paste it all together at the end however.
First thing to notice is the commenting at the top. This is common practice to let readers of the file get an
idea of what it contains and how it should be used. The version information(Author and date) is useful too if
comparing results with someone else who may be using a more or less recent version.
The final section is a feature of Python that calls any module loaded at the command line "__main__" .
We can test the special, built-in __name__ variable and if its main we know the module is not just being
imported but run and so we execute the trigger code inside the if.
This trigger code includes a user friendly hint about how the program should be run if no filename is
provided, or indeed if too many filenames are provided.
Finally notice that the Analyze() function simply calls the other functions in the right order. Again this is
quite common practice to allow a user to choose to either use all of the functionality in a straightforward
manner (through Analyze()) or to call the low level primitive functions directly.
getCharGroups()
103
para_count = para_count + 1
else:
groups = groups + string.split(line)
except:
print "Failed to read file ", sys.argv[1]
sys.exit()
Note 1: We have to use the global keyword here to declare the variables which are
created outside of the function. If we didn't when we assign to them Python will
create new variables of the same name local to this function. Changing these local
variables will have no effect on the module level (or global) values
Note 2: We have used a try/except clause here to trap any errors, report the failure and exit the
program.
getPunctuation()
This takes a little bit more effort and uses a couple of new features of Python.
def getPunctuation(wordList):
global punctuation_counts
for item in wordList:
while item and (item[-1] not in alphas):
p = item[-1]
item = item[:-1]
if p in punctuation_counts.keys():
punctuation_counts[p] = punctuation_counts[p] + 1
else: punctuation_counts[p] = 1
Notice that this does not include the final if/else clause of the psudo-code version. I
left it off for simplicity and because I felt that in practice very few words containing
only punctuation characters would be found. We will however add it to the final
version of the code.
Note 1: We have paramaterised the wordList so that users of the module can supply their own list rather than
being forced to work from a file.
Note 2: We assigned item[:-1] to item. This is known as slicing in Python and the colon simply says
treat the index as a range. We could for example have specified item[3:6] to extract item[3},
item[4] and item[5] into a list.
The default range is the start or end of the list depending on which side of the colon is blank. Thus
item[3:] would signify all members of item from item[3] to the end. Again this is a very useful
Python feature. The original item list is lost (and duly garbage collected) and the newly created list
assigned to item
104
Note 3: We use a negative index to extract the last character from item. This is a very useful Python
feature. Also we loop in case there are multiple punctuation characters at the end of a group.
In testing this it became obvious that we need to do the same at the front of a group too, since although
closing brackets are detected opening ones aren't! To overcome this problem I will create a new function
trim() that will remove punctuation from front and back of a single char group:
#########################################################
# Note trim uses recursion where the terminating condition
# is either 0 or -1. An "InvalidEnd" error is raised for
# anything other than -1, 0 or 2.
##########################################################
def trim(item,end = 2):
""" remove non alphas from left(0), right(-1) or both ends of
item"""
if end == 2:
trim(item, 0)
trim(item, -1)
else:
while (len(item) > 0) and (item[end] not in alphas):
ch = item[end]
if ch in punctuation_counts.keys():
punctuation_counts[ch] = punctuation_counts[ch] + 1
if end == 0: item = item[1:]
if end == -1: item = item[:-1]
Notice how the use of recursion combined with defaulted a parameter enables us to
define a single trim function which by default trims both ends, but by passing in an
end value can be made to operate on only one end. The end values are chosen to
reflect Python's indexing system: 0 for the left end and -1 for the right. I originally
wrote two trim fnctions, one for each end but the amount of similarity made me
realize that I could combine them using a parameter.
def getPunctuation(wordList):
for item in wordList:
trim(item)
# Now delete any empty 'words'
for i in range(len(wordList)):
if len(wordList[i]) == 0:
del(wordList[i])
Note 2: In the interests of reusability we might have been better to break trim down into smaller chunks yet.
This would have enabled us to create a function for removing a single punctuation character from either
front or back of a word and returning the character removed. Then another function would call that one
repeatedly to get the end result. However since our module is really about producing statistics from text
rather than general text processing that should properly involve creating a separate module which we could
then import. But since it would only have the one function that doesn't seem too useful either. So I'll leave it
as is!
105
The final grammar module
The only thing remaining is to improve the reporting to include the punctuation characters and the counts.
Replace the existing reportStats() function with this:
def reportStats():
global sentence_count, clause_count
for p in stop_tokens:
sentence_count = sentence_count + punctuation_counts[p]
for c in punctuation_counts.keys():
clause_count = clause_count + punctuation_counts[c]
print format % (sys.argv[1],
para_count, line_count, sentence_count,
clause_count, len(groups))
print "The following punctuation characters were used:"
for p in punctuation_counts.keys():
print "\t%s\t:\t%3d" % (p, punctuation_counts[p])
If you have carefully stitched all the above functions in place you should now be able
to type:
C:> python grammar.py myfile.txt
and get a report on the stats for your file myfile.txt (or whatever it's really called).
How useful this is to you is debateable but hopefully reading through the evolution of
the code has helped you get some idea of how to create your own programs. The main
thing is to try things out. There's no shame in trying several approaches, often you
learn valuable lessons in the process.
To conclude our course we will rework the grammar module to use OO techniques. In the process you will
see how an OO approach results in modules which are even more flexible for the user and more extensible
too.
By moving these globals into a class we can then create multiple instances of the class (one per file) and
each instance gets its own set of variables. Further, by making the methods sufficiently granular we can
create an architecture whereby it is easy for the creator of a new type of document object to modify the
search criteria to cater for the rules of the new type. (eg. by rejecting all HTML tags from the word list).
#! /usr/local/bin/python
################################
# Module: document.py
# Author: A.J. Gauld
# Date: 2000/08/12
# Version: 2.0
################################
# This module provides a Document class which
# can be subclassed for different categories of
# Document(text, HTML, Latex etc). Text and HTML are
# provided as samples.
#
# Primary services available include
# - getCharGroups(),
106
# - getWords(),
# - reportStats().
################################
import sys,string
class Document:
def __init__(self, filename):
self.filename = filename
self.para_count = 1
self.line_count, self.sentence_count, self.clause_count,
self.word_count = 0,0,0,0
self.alphas = string.letters + string.digits
self.stop_tokens = ['.','?','!']
self.punctuation_chars = ['&','(',')','-',';',':',','] +
self.stop_tokens
self.lines = []
self.groups = []
self.punctuation_counts = {}
for c in self.punctuation_chars + self.stop_tokens:
self.punctuation_counts[c] = 0
self.format = """%s contains:
%d paragraphs, %d lines and %d sentences.
These in turn contain %d clauses and a total of %d words."""
def getLines(self):
try:
self.infile = open(self.filename,"r")
self.lines = self.infile.readlines()
except:
print "Failed to read file ",self.filename
sys.exit()
def getWords(self):
pass
def Analyze(self):
self.getLines()
self.getCharGroups(self.lines)
self.getWords()
self.reportStats()
class TextDocument(Document):
pass
class HTMLDocument(Document):
pass
107
if __name__ == "__main__":
if len(sys.argv) != 2:
print "Usage: python document.py <filename>"
sys.exit()
else:
D = Document(sys.argv[1])
D.Analyze()
Now to implement the class we need to define the getWords method. We could simply copy what we did
in the previous version and create a trim method, however we want the OO version to be easily extendible so
instead we'll break getWords down into a series of steps. Then in subclasses we only need to override the
substeps and not the whole getWords method. This should allow a much wider scope for dealing with
different types of document.
Specifically we will add methods to reject groups which we recognise as invalid, trim unwanted characters
from the front and from the back. Thus we add 3 methods to Document and implement getWords in
terms of these methods.
class Document:
# .... as above
def getWords(self):
for w in self.groups:
self.ltrim(w)
self.rtrim(w)
self.removeExceptions()
def removeExceptions(self):
pass
def ltrim(self,word):
pass
def rtrim(self,word):
pass
Notice however that we define the bodies with the single command pass, which does absolutely nothing.
Instead we will define how these methods operate for each concrete document type.
Text Document
A text document looks like:
class TextDocument(Document):
def ltrim(self,word):
while (len(word) > 0) and (word[0] not in self.alphas):
ch = word[0]
if ch in self.c_punctuation.keys():
self.c_punctuation[ch] = self.c_punctuation[ch] + 1
word = word[1:]
return word
def rtrim(self,word):
while (len(word) > 0) and (word[-1] not in self.alphas):
ch = word[-1]
if ch in self.c_punctuation.keys():
self.c_punctuation[ch] = self.c_punctuation[ch] + 1
word = word[:-1]
return word
108
def removeExceptions(self):
top = len(self.groups)
n = 0
while n < top:
if (len(self.groups[n]) == 0):
del(self.groups[n])
top = top - 1
n = n+1
The trim functions are virtually identical to our grammar.py module's trim function,
but split into two. The removeExceptions function has been defined to remove blank
words.
Notice that I have changed the structure of the latter method to use a while loop instead of the previous
for. This is because during testing a bug was found whereby if we deleted elements from the list the
range (calculated at the beginning) still had the original length and we wound up trying to access members
of the list beyond the end. To avoid that we use a while loop and adjust the maximum index each time we
remove an element.
HTML Document
For HTML we will use a feature of Python that we haven't seen before: regular
exressions. These are special string patterns that we can use for finding complex
strings. Here we use them to remove anything between < and >. This means we will
need to redefine getWords. The actual stripping of punctuation should be the same
as for plain text so instead of inheriting directly from Document we will inherit from
TextDocument and reuse its trim methods.
class HTMLDocument(TextDocument):
def removeExceptions(self):
""" use regular expressions to remove all <.+?> """
import re
tag = re.compile("<.+?>")# use non greedy re
L = 0
while L < len(self.lines):
if len(self.lines[L]) > 1: # if its not blank
self.lines[L] = tag.sub('', self.lines[L])
if len(self.lines[L]) == 1:
del(self.lines[L])
else: L = L+1
else: L = L+1
def getWords(self):
self.removeExceptions()
for i in range(len(self.groups)):
w = self.groups[i]
w = self.ltrim(w)
self.groups[i] = self.rtrim(w)
TextDocument.removeExceptions(self)# now strip empty words
Note 1: The only thing to note here is the call to self.removeExceptions before
trimming and then calling TextDocument.removeExceptions. If we had relied on
the inherited getWords it would have called our removeExceptions after trimming
which we don't want.
109
Adding a GUI
To create a GUI we will use Tkinter which we introduced briefly in the Event Driven
Programming section. This time the GUI will be slightly more sophisticated and use
nmore of the graphical controls or widgets that Tkinter provides.
Before we get to that stage we need to modify our Document class. The current
version prints out the results to stdout as part of the analyze method. However for a
GUI we really don't want that. Instead we would like the analyze method to simply
store the totals in the counter attributes and we can access them as needed. To do this
we simply split or refactor the reportStats() method into two parts:
generateStats() which will calculate the values and store them in the counters and
printStats() which will print to stdout.
Finally we need to modify Analyze to call generateStats() and the main sequence to specifically call
printStats() after Analyze. With these changes in place the existing code will carry on working as before, at
least as far as the command line user is concerned. Other programmers will have to make slight changes to
their code to printStats() after using Analyze - not too onerous a change.
def generateStats(self):
self.word_count = len(self.groups)
for c in self.stop_tokens:
self.sentence_count = self.sentence_count +
self.punctuation_counts[c]
for c in self.punctuation_counts.keys():
self.clause_count = self.clause_count +
self.punctuation_counts[c]
def printStats(self):
print self.format % (self.filename, self.para_count,
self.line_count, self.sentence_count,
self.clause_count, self.word_count)
print "The following punctuation characters were used:"
for i in self.punctuation_counts.keys():
print "\t%s\t:\t%4d" % (i,self.punctuation_counts[i])
and:
if __name__ == "__main__":
if len(sys.argv) != 2:
print "Usage: python document.py <filename>"
sys.exit()
else:
try:
D = HTMLDocument(sys.argv[1])
D.Analyze()
D.printStats()
except:
print "Error analyzing file: %s" % sys.argv[1]
Now we are ready to create a GUI wrapper around our document classes.
110
Designing a GUI
The first step is to try to visualise how it will look. We need to specify a filename, so it will require an Edit
or Entry control. We also need to specify whether we want textual or HTML analysis, this type of 'one from
many' choice is usually represented by a set of Radiobutton controls. These controls should be grouped
together to show that they are related.
The next requirement is for some kind of display of the results. We could opt for multiple Label controls one
per counter. Instead I will use a simple text control into which we can insert strings, this is closer to the spirit
of the commandline output, but ultimately the choice is a matter of preference by the designer.
Finally we need a means of initiating the analysis and quitting the application. Since we will be using a text
control to display results it might nbe useful to have a means of resetting the display too. These command
options can all be represented by Button controls.
+-------------------------+-----------+
| FIILENAME | O TEXT |
| | O HTML |
+-------------------------+-----------+
| |
| |
| |
| |
| |
+-------------------------------------+
| |
| ANALYZE RESET QUIT |
| |
+-------------------------------------+
Now we can write some code, lets take it step by step:
from Tkinter import *
import document
Here we have imported the Tkinter and document modules. For the former we have made all of the Tkinter
names visible within our current module whereas with the latter we will need to prefix the names with
'document.'
We have also defined an __init__ method which calls the Frame.__init__ superclass method to
ensure that Tkinter is set up properly internally. We then create an attribute which will store the document
type value and finally call the buildUI method which creates all the widgets for us.
def buildUI(self):
# Now the file information: File name and type
fFile = Frame(self)
Label(fFile, text="Filename: ").pack(side="left")
self.eName = Entry(fFile)
self.eName.insert(INSERT,"test.htm")
self.eName.pack(side="left", padx=5)
111
# to keep the radio buttons lined up with the
# name we need another frame
fType = Frame(fFile, borderwidth=1, relief=SUNKEN)
self.rText = Radiobutton(fType, text="TEXT",
variable = self.type, value=2,
command=self.doText)
self.rText.pack(side=TOP)
self.rHTML = Radiobutton(fType, text="HTML",
variable=self.type, value=1,
command=self.doHTML)
self.rHTML.pack(side=TOP)
# make TEXT the default selection
self.rText.select()
fType.pack(side="right", padx=3)
fFile.pack(side="top", fill=X)
fButts.pack(side=BOTTOM, fill=X)
self.pack()
I'm not going to explain all of that, instead I recommend you take a look at the Tkinter tutorial found on the
Python web site. This is an excellent introduction and reference to Tkinter. The general principle is that you
create widgets from their corresponding classes, providing options as named parameters, then the widget is
packed into its containing frame.
The other key points to note are the use of subsidiary Frame widgets to hold the Radiobuttons and
Command buttons. The Radiobuttons also take a pair of options called variable & value, the former
links the Radiobuttons together by specifying the same external variable (self.type) and the latter gives
a unique value for each Radiobutton. Also notice the command=xxx options passed to the button controls.
These are the methods that will be called by Tkinter when the button is pressed. The code for these comes
next:
112
# restore default settings
def doReset(self):
self.txtBox.delete(1.0, END)
self.rText.select()
def doHTML(self):
self.type = 1
These methods are all fairly trivial and hopefully by now are self explanatory. The final event handler is the
one which does the analysis:
Again you should be able to read this and see what it does. The key points are that:
All that's needed now is to create an instance of the Application object and set the event loop running, we do
this here:
myApp = GrammarApp()
myApp.mainloop()
Lets take a look at the final result as seen under MS Windows, displaying the results of analyzing a test
HTML file, first in Text mode then in HTML mode:
113
That's it. You can go on to make the HTML processing more sophisticated if you want to. You can create
new modules for new document types. You can try swapping the text box for multiple labels packed into a
frame. But for our purposes we're done. The next section offers some ideas of where to go next depending on
your programming aspirations. The main thing is to enjoy it and allways remember: the computer is dumb!
114
References
Books to read
Python
Learning Python
Mark Lutz - O'Reilly press. Probably the best book on programming Python if
you already know another language. Typical O'Reilly syle, so if you don't like
that you may prefer:
Internet Programming with Python
Guido Van Rossum et al - ??? Written by the language's creator with a strong
bias to internet programming including HTML, CGI and general sockets. It
does have a general language tutorial at the beginning though.
Programming Python
Mark Lutz - O'Reilly press. The classic text. It describes the why's and
wherefores of the language better than the others, strong on modules and OOP.
Also gives an intro to GUI programming.
There is also an excellent online book for more advanced Python programmers called Dive into Python
Tcl
There are several other Tcl/Tk books but I have no personal experience with any but Ousterhout.
BASIC
There are many many books on BASIC covering each of its many dialects. If you are serious in pursuing
programming in BASIC, especially on rthe PC then I strongly recommend using Visual Basic and studying
any of the many many books on that version.
General Programming
There are some classic programming texts that any serious programmer should own and read regularly. Here
are my personal favourites:
Code Complete
Steve McConnell - Microsoft Press. This is the most complete reference on all
things to do with writing code that I know. I read it after several years of
experience and it all rang true and I even learnt some new tricks. It literally
changed the way I wrote programs. Buy it. Now!
115
Programming Pearls
Jon Bentley - Addison Wesley. There are two volumes, both invaluable.
Bentley shows how to improve the efficiency of your programs in every
conceivable way, from concept through design to implementation.
These are part of a programming library that came out of Bell Labs in the 1980's in the wake of
Unix. There are so many classics in this series that I will simply say that anything from the pens of
Ken Thompson, Jon Bentley, Dennis Ritchie, Andrew Keonig and the rest at Bell Labs is worth
reading. The styles may vary but the content is pure gold.
116
From Clouds to Code
Jesse Liberty(Wrox Press). This book takes you through the process of
building a real OO application - warts and all. Its rather like our Case study
but much bigger and includes use of design tools like UML.
Python
Tcl
The definitive Tcl site at the time of writing - it has a habit of moving!
BASIC
There are other online web sites for VB resources: components, tips, chat-rooms etc.
Programming in General
Try finding some general programming links pages on Yahoo etc. There are several good ones out there, I
have no particular favourite.
some specifics
• Rational Corp make upmarket tools and host some useful information about OO
development methods and the new UML modelling notation.
117
Projects to try
There are several ideas for projects listed in the tutorial. In addition I will give some ideas here, in
approximately ascending order of difficulty. Most will be achievable with the skills learn't here but all of
them can be improved by checking the documentation that comes with Python for alternatives. A couple will
definitely require that you start digging for yourself, recall that one of the requirements of a good
programmer was curiosity!
118