PBE2
PBE2
PBE2
ii
http://deepintopharo.com
Copyright 2013 by Alexandre Bergel, Damien Cassou, Stphane Ducasse and Jannik Laval.
The contents of this book are protected under Creative Commons Attribution-ShareAlike 3.0
Unported license.
You are free:
to Share to copy, distribute and transmit the work
to Remix to adapt the work
Under the following conditions:
Attribution. You must attribute the work in the manner specified by the author or licensor (but
not in any way that suggests that they endorse you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting
work only under the same, similar or a compatible license.
For any reuse or distribution, you must make clear to others the license terms of this
work. The best way to do this is with a link to this web page: creativecommons.org/licenses/
by-sa/3.0/
Any of the above conditions can be waived if you get permission from the copyright
holder.
Nothing in this license impairs or restricts the authors moral rights.
Your fair dealing and other rights are in no way affected by the above. This
is a human-readable summary of the Legal Code (the full license):
creativecommons.org/licenses/by-sa/3.0/legalcode
Contents
1
Preface
Libraries
2.1
2.2
2.3
2.4
Anatomy of a handler . . . . . . . . . . . . . . .
12
2.5
13
2.6
Chapter summary. . . . . . . . . . . . . . . . .
14
15
3.1
Getting started . . . . . . . . . . . . . . . . . .
15
3.2
. . . . . . . . . . . . . .
16
3.3
19
3.4
20
3.5
21
3.6
27
3.7
Chapter summary. . . . . . . . . . . . . . . . .
29
Sockets
31
4.1
Basic Concepts . . . . . . . . . . . . . . . . . .
31
4.2
TCP Client . . . . . . . . . . . . . . . . . . .
33
4.3
TCP Server . . . . . . . . . . . . . . . . . . .
37
iv
Contents
4.4
SocketStream . . . . . . . . . . . . . . . . . .
43
4.5
48
4.6
Chapter summary. . . . . . . . . . . . . . . . .
49
51
5.1
Settings architecture . . . . . . . . . . . . . . . .
51
5.2
53
5.3
Declaring a setting
. . . . . . . . . . . . . . . .
56
5.4
61
5.5
. . . . . . . . .
65
5.6
Launching a script
. . . . . . . . . . . . . . . .
68
5.7
. . . . . . . . . . . .
69
5.8
71
5.9
Chapter summary. . . . . . . . . . . . . . . . .
75
77
6.1
78
6.2
Regex syntax
. . . . . . . . . . . . . . . . . .
85
6.3
Regex API
. . . . . . . . . . . . . . . . . . .
90
6.4
96
6.5
Chapter summary. . . . . . . . . . . . . . . . .
97
II
Source Management
7.1
7.2
7.3
7.4
7.5
7.6
7.7
8.1
101
129
8.2
8.3
8.4
Gofer actions
8.5
8.6
9.1
Introduction . . . . . . . . . . . . . . . . . . . 147
9.2
9.3
9.4
9.5
9.6
9.7
Baselines . . . . . . . . . . . . . . . . . . . . 157
9.8
Groups. . . . . . . . . . . . . . . . . . . . . 160
9.9
9.10
9.11
9.12
9.13
9.14
9.15
9.16
9.17
III
Frameworks
10
Glamour
10.1
10.2
10.3
10.4
11
11.1
. . . . . . . . . . . . . . . . . . 135
147
. . . . . . . . . . 154
. . . . . . . 155
191
209
vi
Contents
11.2
11.3
11.4
11.5
Layouts . . . . . . . . . . . . . . . . . . . . 224
11.6
11.7
11.8
11.9
11.10
12
12.1
12.2
12.3
12.4
12.5
12.6
12.7
12.8
12.9
Subviews . . . . . . . . . . . . . . . . . . . . 255
12.10
12.11
Events . . . . . . . . . . . . . . . . . . . . . 257
12.12
Interaction . . . . . . . . . . . . . . . . . . . 257
12.13
IV
Language
13
Handling Exceptions
13.1
Introduction . . . . . . . . . . . . . . . . . . . 263
13.2
13.3
13.4
13.5
13.6
. . . . . . . . . . . . . . 221
. . . . . . . . . . . . . . . 230
241
. . . . . . . . . 247
263
vii
13.7
13.8
13.9
13.10
13.11
13.12
13.13
13.14
Specific exceptions
13.15
13.16
13.17
13.18
14
14.1
Basics . . . . . . . . . . . . . . . . . . . . . 304
14.2
14.3
14.4
14.5
14.6
Message execution
14.7
15
15.1
15.2
15.3
15.4
15.5
Negative numbers
15.6
15.7
15.8
Hexadecimal
15.9
. . . . . . . . . . . . . . . . 286
303
. . . . . . . 313
. . . . . . . . . . . . . . . . 323
329
. . . . . . . . . . . . . . . . 336
. . . . . . . . . . . . . . . . . . 342
viii
Contents
16
345
16.1
16.2
16.3
16.4
16.5
16.6
Tools
17
Profiling Applications
17.1
17.2
17.3
17.4
17.5
17.6
17.7
17.8
17.9
17.10
17.11
18
18.1
18.2
18.3
Testing a grammar
18.4
18.5
18.6
18.7
19
Biographies
. . . . . . . . . . . . 352
357
. . . . . . . . . . . . 362
375
. . . . . . . . . . . 375
. . . . . . . . . . . . . . . . 389
411
Chapter 1
Preface
Smalltalk is well known as an excellent tool for agile and exploratory
programming. In this book the authors present a new dialect of
Smalltalk called Pharo that has been specifically designed for inventive
developers. The authors are key members of the Pharo team and accomplished Object Oriented educators, researchers and designers. Numerous Smalltalk projects from the authors and others have been ported to
Pharo. Enjoy Deep Into Pharo
- Dave Thomas1 Using a programming language is so far the most convenient way for
a human to tell a computer what it should do. Pharo is an object-oriented
programming language, highly influenced by Smalltalk. Pharo is more than
a syntax and a bunch of semantics rules as most programming languages
are. Pharo comes with an extensible and flexible programming environment. Thanks to its numerous object-oriented libraries and frameworks,
Pharo shines for modeling and visualizing data, scripting, networking and
many other ranges of applications.
The very light syntax and the malleable object model of Pharo are commonly praised. Both early learners and experienced programmers enjoy the
everything is an object paradigm. The simplicity and expressiveness of
Pharo as well as a living environment empowers programmers with a wonderful and unique experience.
Deep into Pharo is the second volume of a book series initiated with
Pharo by Example2 . Deep into Pharo, the book you are reading, accompa1 David (http://www.davethomas.net) is a well-known figure in modern software development
and object technology. Thomas is perhaps best known as the founder and past CEO of Object
Technology International, Inc., now IBM OTI Labs. OTI was responsible for initial development
of the Eclipse open source IDE and the Visual Age Java development environment.
2 freely available from http://pharobyexample.org
Preface
nies the reader for a fantastic journey into exciting parts of Pharo. It covers
new libraries such as FileSystem, frameworks such as Roassal and Glamour,
complex of the system aspects such as exceptions and blocks.
The book is divided into 5 parts and 17 chapters. The first part deals
with truly object-oriented libraries. The second part is about source code
management. The third part is about advanced frameworks. The fourth
part covers advanced topics of the language, in particular exception, blocks
and numbers. The fifth and last part is about tooling, including profiling and
parsing.
Pharo is supported by a strong community that grows daily. Pharos
community is active, innovative, and is always pushing limits of software
engineering. The Pharo community consists of software engineering software, casual programmers but also high-level consultants, researchers, and
teachers. This book exists because of the Pharo community and we naturally
dedicate this book to this group of people that many of us consider as our
second family.
Acknowledgments
We would like to thank various people who have contributed to this book.
In particular, we would like to thank:
Camillo Bruni for his participation in the Zero Configuration chapter.
Noury Bouraqadi and Luc Fabresse for the Socket chapter.
Alain Plantec for his effort in the Setting Framework chapter and his
effort to integrate it into Pharo.
Oscar Nierstrasz for writing and co-editing some chapters such as
Regex and Monticello.
Dale Henrichs and Mariano Martinez Peck for their participation in the
Metacello chapter.
Tudor Doru Girba for the Glamour chapter and the first documentation.
Clment Bera for his effort on the Exception chapter.
Nicolas Cellier for his participation in the Fun with Floats chapter.
Lukas Renggli for PetitParser and his work on the refactoring engine
and smallLint rules.
Jan Kurs and Guillaume Larcheveque for their participation in the PetitParser chapter.
Colin Putney for the initial version of FileSystem and Camillo Bruni
for his review of FileSystem and his rewrite of the Pharo Core.
Vanessa Pea for her participation in the Roassal and Mondrian chapters.
Renato Cerro for his help in proofreading and editing.
You, for your questions, support, bug fixes, contribution, and encouragement.
We would like to also thank Hernan Wilkinson and Carlos Ferro for their
reviews, Nicolas Cellier for the feedback on the number chapter, and Vassili
Bykov for permission to adapt his Regex documentation
We thank Inria Lille Nord Europe for supporting this open-source project
and for hosting the web site of this book. We also thank Object Profile for
sponsoring the cover.
And last but not least, we also thank the Pharo community for its enthusiastic support of this project, and for informing us of the errors found in the
first edition of this book.
We are also grateful to our respective institutions and national research
agencies for their support and offered facilities. In particular, we thank
Program U-INICIA 11/06 VID 2011, University of Chile, and FONDECYT
project 1120094. We also thank the Plomo quipe Associe.
Part I
Libraries
Chapter 2
2.1
If you do not have wget installed you can use curl -L instead.
To execute the script that we just downloaded, you should change its
permissions using chmod a+x or invoke it via bash as follows.
Configurations. There is a plethora of configurations available. The URL
for each script can be easily built from an image version and a vm following
the expression: get.pharo.org/$IMAGE+$VM
Possible values for $IMAGE are: 12 13 14 20 30 stable alpha
Possible values for $VM are: vm vmS vmLatest vmSLatest
Of course, one can just download an image as well get.pharo.org/$IMAGE or
just the VM get.pharo.org/$VM
Looking at the help. Now lets have a look at the script help.
bash 20+vm --help
The help says that the 20+vm command downloads the current virtual
machine and puts it into the pharo-vm folder. In addition, it creates several
scripts: pharo to launch the system, pharo-ui a script to launch the image in
UI mode. Finally, it also downloads the latest image and changes files.
This script downloads the latest Pharo 20 Image.
This script downloads the latest Pharo VM.
The following artifacts are created:
Pharo.changes A changes file for the Pharo Image
Pharo.image A Pharo image, to be opened with the Pharo VM
pharo
Script to run the downloaded VM in headless mode
pharo-ui
Script to run the downloaded VM in UI mode
pharo-vm/
Directory containing the VM
Grabbing and executing it. If you just want to directly execute the script
you can also do the following
wget -O - get.pharo.org/20+vm | bash
The option -O - will output the downloaded bash file to standard out, so
we can pipe it to bash. If you do not like the log of web, use --quiet.
wget --quiet -O - get.pharo.org/20+vm | bash
Note for the believers in automated tasks. The scripts are fetched automatically from our Jenkins server (https://ci.inria.fr/pharo/job/Scripts-download/)
from the gitorious server https://gitorious.org/pharo-build/pharo-build. Yes we believe in automated tasks that free our energy.
2.2
You can also use different scripts. For example get.pharo.org/vm only downloads the latest vm.
wget -O - get.pharo.org/vm | bash
Figure 2.1 shows the list of scripts available that you can get at
http://get.pharo.org.
2.3
We have a brand new and nice way to handle command line arguments. It
is self-documented and easily extendable. Let us have a look at how the
command line is handled. As usual we will start by showing you how to
find your way alone.
10
11
The --version argument gives the version of the virtual machine. If you
wish to obtain the version of the image, then you need to open the image,
use the World menu, and select About.
List of available handlers. The command line option --list lists of the current option handlers. This list depends on the handlers that are currently
loaded in the system. In particular, it means that you can simply add a handler for your specific situation and wishes.
The following list shows the available handlers.
./pharo Pharo.image --list
Currently installed Command Line Handlers:
st
Loads and executes .st source files
Fuel
Loads fuel files
config
Install and inspect Metacello Configurations from the command line
save
Rename the image and changes file
test
A command line test runner
update
Load updates
printVersion Print image version
eval
Directly evaluates passed in one line scripts
Note that this help is the one of the associated handler, not one of the
command line generic system.
Usage: config [--help] <repository url> [<configuration>] [--install[=<version>]] [-group=<group>] [--username=<username>] [--password=<password>]
--help
show this help message
<repository url> A Monticello repository name
<configuration> A valid Metacello Configuration name
<version>
A valid version for the given configuration
<group>
A valid Metacello group name
<username>
An optional username to access the configuration's repository
<password>
An optional password to access the configuration's repository
12
Examples:
# display this help message
pharo Pharo.image config
# list all configurations of a repository
pharo Pharo.image config $MC_REPOS_URL
# list all the available versions of a confgurtation
pharo Pharo.image config $MC_REPOS_URL ConfigurationOfFoo
# install the stable version
pharo Pharo.image config $MC_REPOS_URL ConfigurationOfFoo --install
#install a specific version '1.5'
pharo Pharo.image config $MC_REPOS_URL ConfigurationOfFoo --install=1.5
#install a specific version '1.5' and only a specific group 'Tests'
pharo Pharo.image config $MC_REPOS_URL ConfigurationOfFoo --install=1.5 -group=Tests
2.4
Anatomy of a handler
As we mentioned, the command line mechanism is open and can be extended. We will look now how at the handler for the eval option is defined.
Evaluating Pharo Expressions. You can use the command line to evaluate
expressions as follows: ./pharo Pharo.image eval '1+2'
./pharo Pharo.image eval --help
Usage: eval [--help] <smalltalk expression>
--help list this help message
<smallltalk expression> a valid Smalltalk expression which is evaluated and
the result is printed on stdout
Documentation:
A CommandLineHandler that reads a string from the command line, outputs the
evaluated result and quits the image.
This handler either evaluates the arguments passed to the image:
$PHARO_VM my.image eval 1 + 2
or it can read directly from stdin:
echo "1+2" | $PHARO_VM my.image eval
13
Now the handler is defined as follows: First we define a subclass of CommandLineHandler. Here BasicCodeLoader is a subclass of CommandLineHandler
and EvaluateCommandLineHandler is a subclass of BasicCodeLoader.
BasicCodeLoader subclass: #EvaluateCommandLineHandler
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'System-CommandLine'
We then define the commandName on the class side as well as the method
isResponsibleFor:.
EvaluateCommandLineHandler class>>commandName
^ 'eval'
EvaluateCommandLineHandler class>>isResponsibleFor: commandLineArguments
"directly handle top-level -e and --evaluate options"
commandLineArguments withFirstArgument: [ :arg|
(#('-e' '--evaluate') includes: arg)
ifTrue: [ ^ true ]].
^ commandLineArguments includesSubCommand: self commandName
EvaluateCommandLineHandler class>>description
^ 'Directly evaluates passed in one line scripts'
Then we define the method activate which will be executed when the option matches.
EvaluateCommandLineHandler>>activate
self activateHelp.
self arguments ifEmpty: [ ^ self evaluateStdIn ].
self evaluateArguments.
self quit.
2.5
Now that we have such scripts and the possibility to specify option, we can
write Jenkins scripts which rely on BASH as least as possible.
14
For example here is the command that we use in Jenkins for the project
XMLWriter (which is hosted on PharoExtras).
# Jenkins puts all the params after a / in the job name as well :(
export JOB_NAME=`dirname $JOB_NAME`
wget --quiet -O - get.pharo.org/$PHARO+$VM | bash
./pharo Pharo.image save $JOB_NAME --delete-old
./pharo $JOB_NAME.image --version > version.txt
REPO=http://smalltalkhub.com/mc/PharoExtras/$JOB_NAME/main
./pharo $JOB_NAME.image config $REPO ConfigurationOf$JOB_NAME --install=
$VERSION --group='Tests'
./pharo $JOB_NAME.image test --junit-xml-output "XML-Writer-.*"
zip -r $JOB_NAME.zip $JOB_NAME.image $JOB_NAME.changes
2.6
Chapter summary
You can now really easily access to the latest version of Pharo and build
scripts. In addition, the command-line handler opens new horizons to be
used in shell scripts.
Chapter 3
3.1
Getting started
The framework supports different kinds of filesystems that are interchangeable and may transparently work with each other. The probably most
common usage of FileSystem is to directly work with files stored on your
hard-drive. We are going to work with that one for now.
The class FileSystem offers factory class-methods to offer access to different filesystems. Sending the message disk to FileSystem, returns a file system
as on your physical hard-drive. Sending memory creates a new file system
stored in memory image.
| working |
working := FileSystem disk workingDirectory.
/Users/ducasse/Workspace/FirstCircle/Pharo/20
16
3.2
Notice that children returns the direct files and folders. To recursively access all the children of the current directory you should use the message
allChildren as follows:
working allChildren.
17
'/Users/ducasse/Workspace/FirstCircle/Pharo/20' asFileReference
Note that no error is raised if the string does not point to an existing file.
You can however check whether the file exists or not:
'foobarzork' asFileReference exists
false
All .st files. Filtering is realized using standard pattern matching on file
name. To find all st files in the working directory, simply execute:
working allChildren select: [ :each | each basename endsWith: 'st' ]
The basename message returns the name of the file from a full name (i.e.,
/foo/gloops.taz basename is 'gloops.taz').
Accessing a given file or directory. Use the slash operator to obtain a reference to a specific file or directory within your working directory:
| working cache |
working := FileSystem disk workingDirectory.
cache := working / 'package-cache'.
Getting to the parent folder. Navigating back to the parent is easy using
the parent message:
| working cache |
working := FileSystem disk workingDirectory.
cache := working / 'package-cache'.
parent := cache parent.
parent = working
true
true
false
false
true
'package-cache'
18
'/Users/ducasse/Workspace/FirstCircle/Pharo/20/package-cache'
cache parent fullName
'/Users/ducasse/Workspace/FirstCircle/Pharo/20/'
The methods exists, isFile, isDirectory, and basename are defined on the
FileReference class. Notice that there is no message to get the path without
the basename and that the idiom is to use parent fullName to obtain it. The
message path returns a Path object which is internally used by FileSystem
and is not meant to be publicly used.
Note that FileSystem does not really distinguish between files and folders
which often leads to cleaner code and can be seen as an application of the
Composite design pattern.
Querying file entry status. To get additional information about a filesystem entry, we should get an FileSystemDirectoryEntry using the message entry.
Note that you can access the file permissions. Here are some examples:
cache entry creation.
2012-04-25T15:11:36+02:00
cache entry creationTime
2012-04-25T15:11:36+02:00
cache entry creationSeconds
3512812296 2012-08-02T14:23:29+02:00
cache entry modificationTime
2012-08-02T14:23:29+02:00
cache entry size.
0 (directories have size 0)
cache entry permissions
rwxr-xr-x
cache entry permissions class
FileSystemPermission
cache entry permissions isWritable true
cache entry isFile
false
cache entry isDirectory
true
If you save a location with your image and move the image to a different machine or operating system, a location will still resolve to the expected
directory or file. Note that some file locations are specific to the virtual machine.
3.3
19
To open a stream on a file, just ask the reference for a read- or write-stream
using the message writeStream or readStream as follows:
| working stream |
working := FileSystem disk workingDirectory.
stream := (working / 'foo.txt') writeStream.
stream nextPutAll: 'Hello World'.
stream close.
stream := (working / 'foo.txt') readStream.
stream contents.
'Hello World'
stream close.
Please note that writeStream overrides any existing file and readStream
throws an exception if the file does not exist. Forgetting to close stream is
a common mistake, for which even advanced programmers regularly fall
into. Closing a stream frees low level resources, which is a good thing to do.
The messages readStreamDo: and writeStreamDo: frees the programmer from
explicitly closing the stream. Consider:
| working |
working := FileSystem disk workingDirectory.
working / 'foo.txt' writeStreamDo: [ :stream | stream nextPutAll: 'Hello World' ].
working / 'foo.txt' readStreamDo: [ :stream | stream contents ].
Keep in mind that file may be easily overridden without giving any warning. Consider the following situation:
| working |
working := FileSystem disk workingDirectory.
working / 'authors.txt' readStreamDo: [ :stream | stream contents ].
'stephane alexandre damien jannik'
We can also use the message openFilestream: aString writable: aBoolean to get
a stream with the corresponding write status.
20
| stream |
stream := FileSystem disk openFileStream: 'authors.txt' writable: true.
stream nextPutAll: 'stephane alexandre damien jannik'.
3.4
Files may be copied and renamed using the messages copyTo: and renameTo:.
Note that while copyTo: tasks as argument another fileReference, renameTo:
takes a path, pathname or reference.
| working |
working := FileSystem disk workingDirectory.
working / 'foo.txt' writeStreamDo: [ :stream | stream nextPutAll: 'Hello World' ].
working / 'foo.txt' copyTo: (working / 'bar.txt').
| working |
working := FileSystem disk workingDirectory.
working / 'bar.txt' readStreamDo: [ :stream | stream contents ].
'Hello World'
| working |
working := FileSystem disk workingDirectory.
working / 'foo.txt' renameTo: 'skweek.txt'.
| working |
working := FileSystem disk workingDirectory.
working / 'skweek.txt' readStreamDo: [ :stream | stream contents ].
'Hello World'
21
Copy everything. You can copy the contents of a directory using the message copyAllTo:. Here we copy the complete package-cache to the backup directory using copyAllTo::
cache copyAllTo: backup.
Note that before copying the target directory is created if it does not exist.
Deleting. To delete a single file, use the message delete:
(working / 'bar.txt') delete.
3.5
22
pf basenameWithoutExtension
'AsmJit-IgorStasenko.66'
pf base
'AsmJit-IgorStasenko'
pf extension
'mcz'
pf extensions
an OrderedCollection('66' 'mcz')
Sizes. FileReference provides also some way to access the size of the file.
pf humanReadableSize
'182.78 kB'
pf size
182778
23
File Information. You can get limited information about the file entry itself
using creationTime and permissions. To get the full information you should
access the entry itself using the message entry.
| pf |
pf := (FileSystem disk workingDirectory / 'package-cache' ) children second.
pf creationTime.
2012-06-10T10:43:19+02:00
pf modificationTime.
2012-06-10T10:43:19+02:00
pf permissions
rw-r--r--
Entries are objects that represent all the metadata of a single file.
| pf |
pf := (FileSystem disk workingDirectory / 'package-cache' ) children second.
pf entry
pf parent entries
"returns all the entries of the children of the receiver"
Operating on files
There are several operations on files.
Deleting. delete, deleteAll, deleteAllChildren, all delete the receiver and raise an
error if it does not exist. delete deletes the file, deleteAll deletes the directory
and its contents , deleteAllChildren (which only deletes children of a directory).
In addition, deleteIfAbsent: executes a block when the file does not exist.
Finally ensureDelete deletes the file but does not raise error if the file does
not exist. Similarly ensureDeleteAllChildren, ensureDeleteAll do not raise exception when the receiver does not exist.
(FileSystem disk workingDirectory / 'paf') delete.
error
(FileSystem disk workingDirectory / 'fooFolder') deleteAll.
error
(FileSystem disk workingDirectory / 'fooFolder') ensureCreateDirectory.
(FileSystem disk workingDirectory / 'fooFolder') deleteAll.
(FileSystem disk workingDirectory / 'paf') deleteIfAbsent: [Warning signal: 'File did not
exist'].
(FileSystem disk workingDirectory / 'fooFolder2') deleteAllChildren.
error
24
Creating Directory. createDirectory creates a new directory and raises an error if it already exists. ensureCreateDirectory verifies that the directory does
not exist and only creates it if necessary. ensureCreateFile creates if necessary
a file.
(FileSystem disk workingDirectory / 'paf' ) createDirectory.
[(FileSystem disk workingDirectory / 'paf' ) createDirectory] on: DirectoryExists do: [:ex|
true].
true
(FileSystem disk workingDirectory / 'paf' ) delete.
(FileSystem disk workingDirectory / 'paf' ) ensureCreateDirectory.
(FileSystem disk workingDirectory / 'paf' ) ensureCreateDirectory.
(FileSystem disk workingDirectory / 'paf' ) isDirectory.
true
Moving/Copying files around. We can move files around using the message moveTo: which expects a file reference.
(FileSystem disk workingDirectory / 'targetFolder') exist
false
(FileSystem disk workingDirectory / 'paf') exist
false
(FileSystem disk workingDirectory / 'paf' ) moveTo: (FileSystem disk workingDirectory / '
targetFolder')
Error
(FileSystem disk workingDirectory / 'paf' ) ensureCreateFile.
(FileSystem disk workingDirectory / 'targetFolder') ensureCreateDirectory.
(FileSystem disk workingDirectory / 'paf' ) moveTo: (FileSystem disk workingDirectory / '
targetFolder' / 'paf').
(FileSystem disk workingDirectory / 'paf' ) exists.
false
(FileSystem disk workingDirectory / 'targetFolder' / 'paf') exists.
true
Besides moving files, we can copy them. We can also use copyAllTo: to
copy files. Here, we copy the files contained in the source folder to the target
one.
The message copyAllTo: performs a deep copy of the receiver, to a location
specified by the argument. If the receiver is a file, the file is copied. If the receiver is a directory, the directory and its contents will be copied recursively.
25
The argument must be a reference that does not exist; it will be created by
the copy.
(FileSystem disk workingDirectory / 'sourceFolder') createDirectory.
(FileSystem disk workingDirectory / 'sourceFolder' / 'pif') ensureCreateFile.
(FileSystem disk workingDirectory / 'sourceFolder' / 'paf') ensureCreateFile.
(FileSystem disk workingDirectory / 'targetFolder') createDirectory.
(FileSystem disk workingDirectory / 'sourceFolder') copyAllTo: (FileSystem disk
workingDirectory / 'targetFolder').
(FileSystem disk workingDirectory / 'targetFolder' / 'pif') exists.
true
(FileSystem disk workingDirectory / 'targetFolder' / 'paf') exists.
true
Locator
Locators are late-bound references. They are left deliberately fuzzy, and are
only resolved to a concrete reference when some file operation is performed.
Instead of a filesystem and path, locators are made up of an origin and a
path. An origin is an abstract filesystem location, such as the users home
directory, the image file, or the VM executable. When it receives a message
like isFile, a locator will first resolve its origin, then resolve its path against
the origin.
Locators make it possible to specify things like "an item named packagecache in the same directory as the image file" and have that specification
remain valid even if the image is saved and moved to another directory, possibly on a different computer.
locator := FileLocator imageDirectory / 'package-cache'.
locator printString.
' {imageDirectory}/package-cache'
locator resolve.
/Users/ducasse/Pharo/PharoHarvestingFixes/20/
package-cache
locator isFile.
false
26
true
References and Locators also provide simple methods for dealing with
whole directory trees.
3.6
27
FileSystem
A filesystem is an interface to access hierarchies of directories and files.
"The filesystem," provided by the host operating system, is represented by
DiskStore and its platform-specific subclasses. However, the user should not
access them directly but instead use FileSystem as we showed previously.
Other kinds of filesystems are also possible. The memory filesystem provides a RAM disk filesystem where all files are stored as ByteArrays in the
image. The zip filesystem represents the contents of a zip file.
Each filesystem has its own working directory, which is used to resolve
any relative paths that are passed to it. Some examples:
fs := FileSystem memory.
fs workingDirectoryPath: (Path / 'plonk').
griffle := Path / 'plonk' / 'griffle'.
nurp := Path * 'nurp'.
fs resolve: nurp.
Path/plonk/nurp
fs createDirectory: (Path / 'plonk').
"/plonk created"
(fs writeStreamOn: griffle) close.
"/plonk/griffle created"
fs isFile: griffle.
true
fs isDirectory: griffle.
false
fs copy: griffle to: nurp.
"/plonk/griffle copied to /plonk/nurp"
fs exists: nurp.
true
fs delete: griffle.
"/plonk/griffle" deleted
fs isFile: griffle.
false
fs isDirectory: griffle.
false
28
Path
Paths are the most fundamental element of the FileSystem API. They represent filesystem paths in a very abstract sense, and provide a high-level
protocol for working with paths without having to manipulate strings. Here
are some examples showing how to define absolute paths (/), relative paths
(*), file extension (,), parent navigation (parent). Normally you do not need to
use Path but here are some examples.
| fs griffle nurp |
fs := FileSystem memory.
griffle := fs referenceTo: (Path / 'plonk' / 'griffle').
nurp := fs referenceTo: (Path * 'nurp').
griffle isFile.
false
griffle isDirectory.
false
griffle parent ensureCreateDirectory.
griffle ensureCreateFile.
griffle exists & griffle isFile.
true
griffle copyTo: nurp.
nurp exists.
true
griffle delete
"absolute path"
Path / 'plonk' / 'feep'
/plonk/feep
"relative path"
Path * 'plonk' / 'feep'
plonk/feep
griffle.txt
"parent directory"
(Path / 'plonk' / 'griffle') parent
griffle.jpeg
/plonk
Chapter summary
"resolving a string"
(Path * 'griffle') resolve: 'plonk'
29
griffle/plonk
"comparing"
(Path / 'plonk') contains: (Path / 'griffle' / 'nurp')
false
Note that some of the path protocol (messages like /, parent and resolve:)
are also available on references.
Visitors
The above methods are sufficient for many common tasks, but application
developers may find that they need to perform more sophisticated operations on directory trees.
The visitor protocol is very simple. A visitor needs to implement visitFile:
and visitDirectory:. The actual traversal of the filesystem is handled by a guide.
A guide works with a visitor, crawling the filesystem and notifying the visitor of the files and directories it discovers. There are three Guide classes,
PreorderGuide, PostorderGuide and BreadthFirstGuide , which traverse the filesystem in different orders. To arrange for a guide to traverse the filesystem with
a particular visitor is simple. Heres an example:
BreadthFirstGuide show: aReference to: aVisitor
The enumeration methods described above are implemented with visitors; see CopyVisitor, DeleteVisitor, and CollectVisitor for examples.
3.7
Chapter summary
FileSystem is a powerful and elegant library to manipulate files. It is a fundamental part of Pharo. The Pharo community will continue to extend and
build it. The class FileReference is the most important entry point to the framework.
FileSystem offers factory class methods to build file systems on hard
disk and in memory.
FileReference is a central class in the framework which represents a file
or a folder. A file reference offers methods to operate on a file and
navigate within a file system.
Sending the message asFileReference to a string character returns its corresponding file reference (e.g., '/tmp' asFileReference)
30
Chapter 4
Sockets
written by:
Noury Bouraqadi (Noury.Bouraqadi@mines-douai.fr)
Luc Fabresse (Luc.Fabresse@mines-douai.fr)
Modern software often involve multiple devices that collaborate through
a network. The basic approach to set up such collaborations is to use sockets. A typical use is in the World Wide Web. Browsers and servers interact
through sockets that carry HTTP requests and responses.
The concept of socket was first introduced by researchers from University
of Berkeley in the 1960s. They defined the first socket API for the C programming language in the context of Unix operating systems. Since then, the
concept of socket spread out to other operating systems. Its API was ported
to almost all programming languages.
In this chapter, we present the API of sockets in the context of Pharo.
We first show through some examples how to use sockets for building both
clients and servers. The notion of client and server are inherent in sockets: a
server waits for requests emitted by clients. Then, we introduce SocketStream
and how to use it. In practice, one is likely to use SocketStream instead of
plain sockets. The chapter ends with a description of some unix networking
utilities that are useful for experimenting.
4.1
Basic Concepts
Socket
A remote communication involves at least two system processes exchanging some data bytes through a network. Each process accesses the network
32
Sockets
through at least one socket (see Figure 4.1). A socket can then be defined as
a plug on a communication network.
Area Networks.
TCP Client
33
showing the use of client sockets to interact with a web server. Next, Section 4.3 presents server sockets. We describe their life-cycle and how to use
them to implement a server that can handle concurrent connections. Last,
we introduce socket streams in Section 4.4. We give an overview of their
benefits by describing their use on both client and server side.
4.2
TCP Client
Name System: basically a directory that maps device names to their IP address.
34
Sockets
which is the generic address to refer to the machine that runs your software
(Pharo here).
Script 4.1: Creating a Socket Address
| esugAddress localAddress |
esugAddress := NetNameResolver addressForName: 'www.esug.org'.
localAddress := NetNameResolver addressForName: '127.0.0.1'.
Now we can connect our TCP socket to the server as shown in Script 4.2.
Message connectTo:port: attempts to connect the socket to the server using the
server address and port provided as parameters. The server address refers
to the address of the network interface (e.g. ethernet, wifi) used by the server.
The port refers to the communication endpoint on the network interface.
Each network interface has for each IP transport protocol (e.g. TCP, UDP)
a collection of ports that are numbered from 0 to 65535. For a given protocol,
a port number on an interface can only be used by a single process.
Script 4.2: Connecting a TCP Socket to ESUG Server.
| clientSocket serverAddress |
clientSocket := Socket newTCP.
serverAddress := NetNameResolver addressForName: 'www.esug.org'.
clientSocket
connectTo: serverAddress port: 80;
waitForConnectionFor: 10.
clientSocket isConnected
true
TCP Client
35
Script 4.3: Exchanging Data with some Server through a TCP Socket.
| clientSocket data |
... "create and connect the TCP clientSocket"
clientSocket sendData: 'Hello server'.
data := clientSocket receiveData.
... "Process data"
Script 4.3 shows the protocol to send and receive data through a client
socket. Here, we send the string 'Hello server!' to the server using the sendData:
message. Next, we send the receiveData message to our client socket to read
the answer. Note that reading the answer is blocking, meaning receiveData
returns when a response has been read. Then, the contents of variable data is
processed.
Script 4.4: Bounding the Maximum Time for Data Reception.
|clientSocket data|
... "create and connect the TCP clientSocket"
[data := clientSocket receiveDataTimeout: 5.
... "Process data"
] on: ConnectionTimedOut
do: [ :timeOutException |
self
crLog: 'No data received!';
crLog: 'Network connection is too slow or server is down.']
Note that by using receiveData, the client waits until the server either sends no more data, or closes the connection. This means that the
client may wait indefinitely. An alternative is to have the client signal a
ConnectionTimedOut exception if client had waited for too long as shown in
Script 4.4. We use message receiveDataTimeout: to ask the client socket to wait
for 5 seconds. If data is received during this period of time, it is processed
silently. But if no data is received during the 5 seconds, a ConnectionTimedOut
is signaled. In the example we log a description of what happened.
Close a Socket
A TCP socket remains alive while devices at both ends are connected. A
socket is closed by sending the message close to it. The socket remains connected until the other side closes it. This may last indefinitely when there
is a network failure or when the other side is down. This is why sockets
also accept the destroy message, which frees system resources required by
the socket.
In practice we use closeAndDestroy. It first attempts to close the socket by
sending the close message. Then, if the socket is still connected after a duration of 20 seconds, the socket is destroyed. Note that there exist a variant
36
Sockets
3 HyperText
TCP Server
4.3
37
TCP Server
Now, let us build a simple TCP server. A TCP Server is an application that
awaits TCP connections from TCP clients. Once a connection is established,
both the server and the client can send a receive data in any order. A big
difference between the server and the client is that the server uses at least
two sockets. One socket is used for handling client connections, while the
second serves for exchanging data with a particular client.
38
Sockets
while exchanging data with possibly multiple clients through multiple interactionSockets (one per client). In the following, we first illustrate the socket
serving machinery. Then, we describe a complete server class and explain
the server life-cycle and related concurrency issues.
First, we create the socket that we will use for handling incoming connections. We configure it to listen on port 9999. The backlogSize is set to 10,
meaning that we ask the Operating System to allocate a buffer for 10 connection requests. This backlog will not be actually used in our example. But, a
more realistic server will have to handle multiple connections and then store
pending connection requests into the backlog.
Once the connection socket (referenced by variable connectionSocket) is set
up, it starts listening for client connections. The waitForAcceptFor: 60 message
makes the socket wait connection requests for 60 seconds. If no client attempts to connect during these 60 seconds, the message answers nil. Otherwise, we get a new socket interactionSocket connected the clients socket. At
this point, we do not need the connection socket anymore, so we can close it
(connectionSocket closeAndDestroy message).
TCP Server
39
Since the interaction socket is already connected to the client, we can use
it to exchange data. Messages receiveData and sendData: presented above (see
Section 4.2) can be used to achieve this goal. In our example, we wait for
data from the client and next display it on the Transcript. Lastly, we send it
back to the client prefixed with the 'ECHO: ' string, finishing the interaction
with the client by closing the interaction socket.
There are different options to test the server of Script 4.6. The first simple
one is to use the nc (netcat) utility discussed in Section 4.5. First run the
server script in a workspace. Then, in a terminal, evaluate the following
command line:
echo "Hello Pharo" | nc localhost 9999
As a result, on the Transcript of the Pharo image, the following line should
be displayed:
Hello Pharo
A pure Pharo alternative relies on using two different images: one that
runs the server code and the other for client code. Indeed, since our examples run within the user interaction process, the Pharo UI will be frozen at
some points, such as during the waitForAcceptFor:. Script 4.7 provides the code
to run on the client image. Note that you have to run the server code first.
Otherwise, the client will fail. Note also that after the interaction, both the
client and the server terminate. So, if you want to run the example a second
time you need to run again both sides.
Script 4.7: Echo Client.
| clientSocket serverAddress echoString |
serverAddress := NetNameResolver addressForName:'127.0.0.1'.
clientSocket := Socket newTCP.
[ clientSocket
connectTo: serverAddress port: 9999;
waitForConnectionFor: 10.
clientSocket sendData: 'Hello Pharo!'.
echoString := clientSocket receiveDataTimeout: 5.
echoString crLog.
] ensure: [ clientSocket closeAndDestroy ].
40
Sockets
The isRunning instance variable is a flag that is set to true while the server
is running. As we will see below, it can be accessed by different processes.
Therefore, we need to ensure that the value can be read in presence of multiple write accesses. This is achieved using a lock (isRunningLock instance variable) that guarantees that isRunning is accessed by only by a single process
each time.
Method 4.9: The EchoServerisRunning Read Accessor
EchoServerisRunning
^ isRunningLock critical: [ isRunning ]
TCP Server
41
42
Sockets
is created and made to listen on port 9999. The backlog size is set to 10 that is
-as mentioned above- the system that allocates a buffer for storing 10 pending client connection requests. This value is a trade-off that depends on how
fast the server is (depending on the VM and the hardware) and the maximum rate of client connections requests. The backlog size has to be large
enough to avoid losing any connection request, but not too big, to avoid
wasting memory. Finally EchoServerstart method creates a process by sending the fork message to the [ self serve ] block. The created process has the same
priority as the creator process (i.e., the one that performs the EchoServerstart
method, the UI process if you have executed it from a workspace).
Method 4.15: The EchoServerinteractOnConnection Method
EchoServerinteractOnConnection
| interactionSocket |
interactionSocket := connectionSocket waitForAcceptFor: 1 ifTimedOut: [^self].
[self interactUsing: interactionSocket] fork
SocketStream
43
4.4
SocketStream
the data exchange by providing buffering together with a set of facility methods. It provides an easy-to-use API on top of Socket.
44
Sockets
The first line creates a stream that encapsulates a newly created socket
connected to the provided server. It is the responsibility of message
openConnectionToHostNamed:port:. It suspends the execution until the connection with the server is established. If the server does not respond, the socket
stream signals a ConnectionTimedOut exception. This exception is actually signaled by the underlying socket. The default timeout delay is 45 seconds
(defined in method Socket classstandardTimeout). One can choose a different
value using the SocketStreamtimeout: method.
Once our socket stream is connected to the server, we forge and send an
HTTP GET query. Notice that compared to script 4.5 (page 36), we skipped
one final String crlf (Script 4.17). This is because the SocketStreamsendCommand:
method automatically inserts CR and LF characters after sending data to
mark line ending.
Reception of the requested web page is triggered by sending the nextLine
message to our socket stream. It will wait for a few seconds until data is
received. Data is then displayed on the transcript. We safely ensure that the
connection is closed.
In this example, we only display the first line of response sent by the
server. We can easily display the full response including the html code by
sending the upToEnd message to our socket stream. Note however, that you
will have to wait a bit longer compared to displaying a single line.
SocketStream
45
A server relying on socket streams still uses a socket for handling incoming connection requests. Socket streams come into action once a socket is
created for interaction with a client. The socket is wrapped into a socket
stream that eases data exchange using messages such as sendCommand: or
nextLine. Once we are done, we close and destroy the socket handling connections and we close the interaction socket stream. The latter will take care
of closing and destroying the underlying interaction socket.
46
Sockets
openConnectionToHostNamed: 'localhost'
port: 9999.
interactionStream binary.
interactionStream nextPutAllFlush: #[65 66 67].
interactionStream upToEnd.
Note that the client manages strings (ascii mode) or byte arrays (binary
mode) have no impact on the server. Indeed in ascii mode, the socket stream
handles instances of ByteString. So, each character maps to a single byte.
Delimiting Data
SocketStream acts simply as a gateway to some network. It sends or reads
bytes without giving them any semantics. The semantics, that is the organization and meaning of exchanged data should be handled by other objects.
Developers should decide on a protocol to use and enforce on both interacting sides to have correct interaction.
A good practice is to reify a protocol that is to materialize it as an object which wraps a socket stream. The protocol object analyzes exchanged
data and decides accordingly which messages to send to the socket stream.
Involved entities in any conversation need a protocol that defines how to organize data into a sequence of bytes or characters. Senders should conform
to this organization to allow receivers to extract valid data from received
sequence of bytes.
One possible solution is to have a set of delimiters inserted between bytes
or characters corresponding to each data. An example of delimiter is the sequence of ASCII characters CR and LF. This sequence is considered so useful that the developers of the SocketStream class introduced the sendCommand:
message. This method (illustrated in script 4.5) appends CR and LF after sent
data. When reading CR followed by LF the receiver knows that the received
sequence of characters is complete and can be safely converted into valid
data. A facility method nextLine (illustrated in script 4.17) is implemented by
SocketStream to perform reading until the reception of CR+LF sequence. One
can however use any character or byte as a delimiter. Indeed, we can ask a
socket stream to read all characters/bytes up to some specific one using the
upTo: message.
The advantage of using delimiters is that it handles data of arbitrary size.
The cons is that we need to analyze received bytes or characters to find out
the limits, which is resource consuming. An alternative approach is to exchange bytes or characters organized in chunks of a fixed size. A typical use
of this approach is for streaming audio or video contents.
Script 4.20: A content streaming source sending data in chunks.
SocketStream
47
To read data in chunks, SocketStream responds to the next: message as illustrated by script 4.21. We consider that we have a server running at port 9999
of our machine that sends a string which size is a multiple of 5. Right after
the connection, we wait 100 milliseconds until the data is received. Then, we
read data in chunks of five characters that we display on the Transcript. So,
if the server sends a string with ten characters 'HelloWorld', we will get on the
Transcript Hello on one line and World on a second line.
48
Sockets
4.5
In sections related to client-side sockets and socket streams, we used interactions with a web server as an example. So, we forged an HTTP Get query
and send it to the server. We chose these examples to make experiments
straightforward and platform agnostic. In real scale applications, interactions involving HTTP should be coded using a higher level library such as
Zinc HTTP Client/Server library that is part of the default Pharo distribution4 .
Network programming can easily scale up in complexity. Using a toolbox outside Pharo is often necessary to identify what the source of an odd
behavior is. This section lists a number of Unix utilities to deal with low
level network operations. Readers with a Unix machine (Linux, Mac OS X)
or with Cygwin (for Windows) can use nc (or netcat), netstat and lsof for their
tests.
nc (netcat)
nc allows one to set up either a client or a server for both TCP (default protocol) and UDP. It redirects the content of its stdin to the other side. The following snippet shows how to send 'Hello from a client' to a server on the local
The command line below starts a server listening on port 9090 that sends
'Hi from server' to the first client to connect. It terminates after the interaction.
echo Hi from server | nc -l 9090
You can keep the server running by means of option -k. But, the string
produced by the preceding echo is sent only to the first client to connect. An
alternative solution is to make the nc server send text while you type. Simply
evaluate the following command line:
echo nc -lk 9090
Type in some text in the same terminal where you started the server.
Then, run a client in another terminal. Your text will be displayed on the
client side. You can repeat these two last actions (type text at the server side,
then start client) as many times as needed.
You can even go more interactive by making the connection between a
client and a server more persistent. By evaluating the following command
4 http://zn.stfx.eu/zn/index.html
Chapter summary
49
line, the client sends every line (ended with "Enter"). It will terminate when
sending the EOF signal (ctl-D).
echo cat | nc -l 9090
netstat
This command provides information on network interfaces and sockets of
your computer. It provides many statistics so one needs to use appropriate
options to filter out information. The following command line allows displaying status of tcp sockets and their addresses. Note that the port numbers
and addresses are separated by a dot.
netstat -p tcp -a -n
lsof
The lsof command lists all files open in your system. This of course includes
sockets, since everything is a file in Unix. Why is lsof useful, you would ask,
if we already have netstat? The answer is that lsof shows the link between
processes and sockets. So you can find sockets related to your program.
The example provided by following command line lists TCP sockets. The
n and P options force lsof to display host addresses and ports as numbers.
lsof -nP -i tcp
4.6
Chapter summary
This chapter introduces the use TCP sockets and socket streams to develop
both network clients and servers. It has reviewed the survival kit of network
programming:
Sockets are low-level bi-directional communication gateways instances of class Socket.
Socket-based programming always involves one server and one or
more clients.
A server waits for requests emitted by clients.
Messages sendData: and receiveData are the socket primitives to send and
receive data.
50
Sockets
5 http://smalltalkhub.com/#!/~CAR/rST/
Chapter 5
5.1
Settings architecture
52
The control flow of a subsystem does not involve Settings. This is the major point of difference between Settings and the preference system available
in Pharo1.0.
Vocabulary
A preference is a particular value which is managed as a variable value. Basically such a preference value is stored in a class variable or in an instance
variable of a singleton and is directly managed through the use of simple
accessors. Pharo contains numerous preferences such as the user interface
theme, the desktop background color or a boolean flag to allow or prohibit
the use of sound. We will show how we can define a preference in Section 5.3.
A setting is a declaration (description) of a preference value. To be viewed
and updated through the setting browser, a preference value must be described by a setting. Such a setting is built by a particular method tagged
with a specific pragma. This specific pragma serves as a classification tag
which is used to automatically identify the method as a setting (see Figure 5.1). Section 5.3 explains how to declare a setting.
Pharo users need to browse existing preferences and eventually change
their value through a dedicated user interface. This is the major role of the
Settings Browser presented in Section 5.2.
53
Figure 5.1 shows important points of the architecture put in place by Settings: The Settings package can be unloaded and a package defining preferences does not depend on the Settings package. This architecture is supported by the following points:
Customization points. Each application customization points should be defined. In Figure 5.1, the class RealStateAgent of the package UI-Basic
defines the class variable UsedStrategy which defines where the windows appear. The flow of the package UI-Basic is modular and
self-contained: the class RealStateAgent does not depend on the settings framework. The class RealStateAgent has been designed to be
parametrized.
Description of customization point. The Settings framework supports the
description of the setting UsedStrategy. In Figure 5.1, the package UIBasic Setting defines a method (it could be an extension to the class
RealStateAgent or another class. The important point is that the method
declaring the setting does not refer directly to Setting classes but describes the setting using a builder. This way the description could even
be present in the UI-Basic package without introducing a reference.
Collecting setting for user presentation. The Settings package defines
tools to manage settings such as a Settings Browser that the user opens
to change her/his preferences. The Settings Browser collects settings
and uses their description to change the value of preferences. The
control flow of the program and the dependencies are always from
the package Settings to the package that has preferences and not the
inverse.
5.2
The Settings Browser, shown in Figure 5.2, mainly allows one to browse all
currently declared settings and to change related preference values.
To open the Settings Browser, just use the
( World . System . Settings ) or evaluate the following expression:
World
menu
SettingBrowser open
The settings are presented in several trees in the middle panel. Setting searching and filtering is available from the top tool-bar whereas the bottom panels
show currently selected setting descriptions (left bottom panel) and current
package set (right bottom panel).
54
55
by hitting the return key (or with cmd-s). If such a setting value is changed
often, the drop-list widget comes in handy because you can retrieve and use
previously entered values in one click! Moreover, in case of a FileName or
a DirectoryName, a button is added to open a file name or a directory name
chooser dialog.
Other possible actions are all accessible from the contextual menu. Depending on the selected setting, they may be different. The two possible
versions are shown in Figure 5.3.
Expand all (a): expand all the setting tree nodes recursively. It is also
accessible via the keyboard shortcut cmd-a.
Collapse all (a): collapse all the setting tree nodes recursively. It is also
accessible via the keyboard shortcut cmd-A.
Expand all from here: Expand the currently selected setting tree node
recursively.
Browse (b): open a system browser on the method that declares the
setting. It is also accessible via the keyboard shortcut cmd-b or if you
double-click on a setting. It is very handy if you want to change the
setting implementation or simply see how it is implemented to understand the framework by investigating some examples (how to declare
a setting is explained in Section 5.3).
Display export action string: a setting can be exported as a start-up
action, this menu option allow to display how the start-up action is
coded (Start-up action management is explained in Section 5.7).
Set to default (d): set the selected setting value to the default one. It is
useful if, as an example, you have played with a setting to observe its
effect and finally decide to come back to its default.
Empty list (e): If the input widget is an editable drop-list, this menu
item allows one to forget previously entered values by emptying the
recorded list.
56
5.3
Declaring a setting
All global preferences of Pharo can be viewed or changed using the Settings
Browser. A preference is typically a class variable or an instance variable of a
singleton. If one wants to be able to change a value from the SettingsBrowser,
then a setting must be declared for it. A setting is declared by a particular
class method that should be implemented as follows: it takes a builder as
argument and it is tagged with the <systemsettings> pragma.
The argument, aBuilder, serves as an API or facade for building setting
declarations. The pragma allows the Settings Browser to dynamically discover current setting declarations.
The important point is that a setting declaration should be package specific. It means that each package is responsible for the declaring of its own
settings. For a particular package, specific settings are declared by one or
several of its classes or a companion package. There is no global setting
defining class or package (as was the case in Pharo1.0). The direct benefit is
that when the package is loaded, then its settings are automatically loaded.
When a package is unloaded, then its settings are automatically unloaded.
In addition, a Setting declaration should not refer to any Setting class but to
the builder argument. This assures that your application is not dependent
Declaring a setting
57
on Settings and that you will be able to remove Setting if you want to define
extremely small footprint applications.
Lets take the example of the caseSensitiveFinds preference. It is a boolean
preference which is used for text searching. If it is true, then text finding is
case sensitive. This preference is stored in the CaseSensitiveFinds class variable
of the class TextEditor. Its value can be queried and changed by, respectively,
TextEditor class>>caseSensitiveFinds and TextEditor class>>caseSensitiveFinds: given
below:
TextEditor class>>caseSensitiveFinds
^ CaseSensitiveFinds ifNil: [CaseSensitiveFinds := false]
TextEditor class>>caseSensitiveFinds: aBoolean
CaseSensitiveFinds := aBoolean
To define a setting for this preference (i.e., for the CaseSensitiveFinds class
variable) and be able to see it and change it from the Settings Browser, the
method below is implemented. The result is shown in the screenshot of the
Figure 5.4.
CodeHolderSystemSettings class>>caseSensitiveFindsSettingsOn: aBuilder
<systemsettings>
(aBuilder setting: #caseSensitiveFinds)
target: TextEditor;
label: 'Case sensitive search' translated;
description: 'If true, then the "find" command in text will always make its searches in
a case-sensitive fashion' translated;
parent: #codeEditing.
58
The header
CodeHolderSystemSettings class>>caseSensitiveFindsSettingsOn: aBuilder
...
The pragma
A setting declaration is tagged with the <systemsettings> pragma.
CodeHolderSystemSettings class>>caseSensitiveFindsSettingsOn: aBuilder
<systemsettings>
...
In fact, when the settings browser is opened, it first collects all settings declarations by searching all methods with the <systemsettings> pragma. In addition, if you compile a setting declaration method while a Settings Browser is
opened then it is automatically updated with the new setting.
Declaring a setting
59
argument is considered as the selector used by the Settings Browser to get the
preference value. The selector for changing the preference value is by default
built by adding a colon to the getter selector (i.e., it is caseSensitiveFinds: here).
These selectors are sent to a target which is by default the class in which the
method is implemented (i.e., CodeHolderSystemSettings). Thus, this one line
setting declaration is sufficient if caseSensitiveFinds and caseSensitiveFinds: accessors are implemented in CodeHolderSystemSettings.
In fact, very often, these default initializations will not fit your need. Of
course you can adapt the setting node configuration to take into account
your specific situation. For example, the corresponding getter and setter accessors for the caseSensitiveFinds setting are implemented in the class TextEditor. Then, we should explicitly set that the target is TextEditor. This is done by
sending the message target: to the setting node with the target class TextEditor
passed as argument as shown by the updated definition:
CodeHolderSystemSettings class>>caseSensitiveFindsSettingsOn: aBuilder
<systemsettings>
(aBuilder setting: #caseSensitiveFinds)
target: TextEditor
This very short version is fully functional and enough to be compiled and
taken into account by the Settings Browser as shown by Figure 5.5.
60
Dont forget to send translated to the label and the description strings, it
will greatly facilitate the translation into other languages.
Concerning the classification and the settings tree organization, there are
several ways to improve it. This point is fully detailed in the next section.
One can use this expression to configure the target of a corresponding setting.
As an example the #glyphContrast preference could be declared as follow:
(aBuilder setting: #glyphContrast)
target: FreeTypeSettings current;
label: 'Glyph contrast' translated;
...
This is simple, but unfortunately, declaring such a singleton target like this
is not a good idea. This declaration is not compatible with the Setting style
functionalities (see Section ??). In such a case, one would have to separately
indicate the target class and the message selector to send to the target class
to get the singleton. Thus, as shown in the example below, you should use
the targetSelector: message:
(aBuilder setting: #glyphContrast)
target: FreeTypeSettings;
61
targetSelector: #current;
label: 'Glyph contrast' translated;
...
5.4
Within the Settings Browser, settings are organized in trees where related settings are shown as children of the same parent.
62
Declaring a parent
The simplest way to declare your setting as a child of another setting is to
use the parent: message with the identifier of the parent setting passed as argument. In the example below, the parent node is an existing node declared
with the #codeEditing identifier.
CodeHolderSystemSettings class>>caseSensitiveFindsSettingsOn: aBuilder
<systemsettings>
(aBuilder setting: #caseSensitiveFinds)
target: TextEditor;
label: 'Case sensitive search' translated;
description: 'If true, then the "find" command in text will always make its searches in
a case-sensitive fashion' translated;
parent: #codeEditing.
The #codeEditing node is also declared somewhere in the system. For example,
it could be defined as a group as we will see now.
Declaring a group
A group is a simple node without any value and which is only used for children grouping. The node identified by #codeEditing is created by sending the
group: message to the builder with its identifier passed as argument. Notice
also that, as shown in Figure 5.4, the #codeEditing node is not at root because
it has declared itself as a child of the #codeBrowsing node.
CodeHolderSystemSettings class>>codeEditingSettingsOn: aBuilder
<systemsettings>
(aBuilder group: #codeEditing)
label: 'Editing' translated;
parent: #codeBrowsing.
Declaring a sub-tree
Being able to declare its own settings as a child of a pre-existing node is very
useful when a package wants to enrich existing standard settings. But it can
also be very tedious for settings which are very application specific.
Thus, directly declaring a sub-tree of settings in one method is also possible. Typically, a root group is declared for the application settings and the
children settings themselves are also declared within the same method. This
is simply done through the sending of the with: message to the root group.
The with: message takes a block as argument. In this block, all new settings
are implicitly declared as children of the root group (the receiver of the with:
message).
63
Figure 5.6: Declaring a subtree in one method: the Configurable formatter setting example.
As an example, take a look at Figure 5.6, it shows the settings for the
refactoring browser configurable formatter. This sub-tree of settings is fully
declared in the method RBConfigurableFormatter class>>settingsOn: given below.
You can see that it declares the new root group #configurableFormatter with two
children, #formatCommentWithStatements and #indentString:
RBConfigurableFormatter class>>settingsOn: aBuilder
<systemsettings>
(aBuilder group: #configurableFormatter)
target: self;
parent: #refactoring;
label: 'Configurable Formatter' translated;
description: 'Settings related to the formatter' translated;
with: [
(aBuilder setting: #formatCommentWithStatements)
label: 'Format comment with statements' translated.
(aBuilder setting: #indentString)
label: 'Indent string' translated]
Optional sub-tree
Depending on the value of a particular preference, one might want to hide
some settings because it doesnt make sense to show them. As an example,
if the background color of the desktop is plain then it doesnt make sense to
show settings which are related to the gradient background. Instead, when
the user wants a gradient background, then a second color, the gradient direction, and the gradient origin settings should be presented. Look at the
Figure 5.7:
on the left, the Gradient widget is unchecked, meaning that its actual
value is false; in this case, it has no children,
on the right, the Gradient widget is checked, then the setting value is
set to true and as a consequence, the settings useful to set a gradient
background are shown.
64
65
appearanceSettingsOn: aBuilder
<systemsettings>
(aBuilder group: #appearance)
label: 'Appearance' translated;
description: 'All settings concerned with the look''n feel of your system' translated;
noOrdering;
with: [... ]
You can indicate the order of a setting node among its siblings by sending
the message order: to it with a number passed as argument. The number can
be an Integer or a Float. Nodes with an order number are always placed before
others and are sorted according to their respective order number. If an order
is given to an item, then no ordering is applied for other siblings.
As an example, take a look at how the #standardFonts group is declared:
(aBuilder group: #standardFonts)
label: 'Standard fonts' translated;
target: StandardFonts;
parent: #appearance;
with: [
(aBuilder launcher: #updateFromSystem)
order: 1;
targetSelector: #current;
script: #updateFromSystem;
label: 'Update fonts from system' translated.
(aBuilder setting: #defaultFont)
label: 'Default' translated.
(aBuilder setting: #codeFont)
label: 'Code' translated.
(aBuilder setting: #listFont)
...
5.5
By default, the possible value set of a preference is not restricted and is given
by the actual type of the preference. For example, for a color preference,
the widget allows you to choose whatever color. For a number, the widget
allows the user to enter any number. But in some cases, only a particular
set of values is desired. As an example, for the standard browser or for
the user interface theme settings, the choice must be made among a finite
set of classes, for the free type cache size, only a range from 0 to 50,000 is
66
allowed. In these cases, it is much more comfortable if the widget can only
accept particular values. To address this issue, the domain value set can be
constrained either with a range or with a list of values.
67
68
From the Settings Browser point of view, the content of the list is exactly
the same and the user can not notice any difference because, if an array of
Associations is given as argument to domainValues:, then the keys of the Associations are used for the user interface.
Concerning the value of the preference itself,
if you inspect
In this example, domainValues: takes an array of associations which is computed each time a Settings Browser is opened. Each association is made of
the name of the theme as key and of the class which implements the theme
as value.
5.6
Launching a script
Imagine that you want to launch an external configuration tool or that you
want to allow one to configure the system or a particular package with the
help of a script. In such a case you can declare a launcher. A launcher is
shown with a label as a regular setting except that no value is to be entered
for it. Instead, a button labelled Launch is integrated in the Settings Browser
and clicking on it launch an associated script.
As an example, to use True Type Fonts, the system must be updated by
collecting all the available fonts in the host system. This can be done by
evaluating the following expression:
FreeTypeFontProvider current updateFromSystem
It is possible to run this script from the Settings Browser. The corresponding
launcher is shown in Figure 5.10. The integration of such a launcher is quite
69
simple. You simply have to declare a setting for it! For example, look at how
the launcher for the TT fonts is declared:
GraphicFontSettings class>> standardFontsSettingsOn:
<systemsettings>
(aBuilder group: #standardFonts)
...
(aBuilder launcher: #updateFromSystem) ...
target: FreeTypeFontProvider;
targetSelector: #current;
script: #updateFromSystem;
label: 'Update fonts from system' translated.
5.7
Even if many preferences have been removed from Pharo because they were
obsolete, there are a still a large number of them. And even if the Settings
Browser is easy to use, it may be tedious to set up your own preferences even
for a subset, each time you start working with a new image. A solution is to
implement a script to set all your preferred choices. The best way is to create
a specific class for that purpose. You can then include it in a package that
you can reload each time you want to setup a fresh image. We call this kind
of class a Setting style.
To manage Setting styles, the Settings Browser can be helpful in two ways.
First, it can help you discover how to change a preference value, and second,
it can create and update a particular style for you.
70
Scripting settings
Because preference variables are all accessible with accessor methods, it is
naturally possible to initialize a set of preferences in a simple script. For the
sake of simplicity, lets implement it in a Setting style.
As an example, a script can be implemented to change the background
color and to set all fonts to a bigger one than the default. Lets create a Setting
style class for that. We can call it MyPreferredStyle. The script is defined by a
method of MyPreferredStyle. We call this method loadStyle because this selector
is the standard hook for settings related script evaluating.
MyPreferredStyle>>loadStyle
|fn|
"Desktop color"
PolymorphSystemSettings desktopColor: Color white.
"Bigger font"
n := StandardFonts defaultFont. "get the current default font"
f := LogicalFontfamilyName: n familyName pointSize: 12. "font for my preferred size"
StandardFonts setAllStandardFontsTo: f "reset all fonts"
PolymorphSystemSettings is the class in which all settings related to PolyMorph
are declared. StandardFonts is the class that is used to manage Pharo default
fonts.
Now, the question is if the desktop color setting is declared in
PolymorphSystemSettings and that the DefaultFonts class allows fonts manage-
ment? Where are all these settings declared and managed in general?
The answer is quite simple: just use the Settings Browser! As explained
in Section 5.2, cmd-b or double clicking on an item open a browser on the
declaration of the current setting node. You can also use the contextual menu
for that. Browsing the declaration will give you the target class (where the
preference variable is stored) and the selector for the preference value.
Now we would like MyPreferredStyle>>#loadStyle to be automatically evaluated when MyPreferredStyle is itself loaded in the system. For that purpose,
the only thing to do is to implement an initialize method for the MyPreferredStyle
class:
MyPreferredStyle class>>initialize
self new loadStyle
71
implement a method named styleName on the class side of your style class.
Concerning the example of previous section, it should be implemented as
follows:
MyPreferredStyle class>>styleName
"The style name used by the SettingBrowser"
<settingstyle>
^ 'My preferred style'
MyPreferredStyle class>>styleName takes no argument and must return the
name of your style as a String. The <settingstyle> pragma is used to let the
Settings Browser know that MyPreferredStyle is a setting style class.
Once this method is compiled, open the Setting Browser and popup the
Style top menu. As shown by Figure 5.11, you should see a dialog with a list
of style names comprising your own one.
Figure 5.11: The dialog for loading style with your own style
5.8
As explained in the section 5.2, the Settings Browser is by default able to manage simple preference types. These default possibilities are generally enough.
But there are some situations where it can be very helpful to be able to handle
more complex preference values.
As an example, let us focus on the text selection preferences. We have
the primary selection and three other optional kinds of text selection, the
secondary selection, the find and replace selection and the selection bar. For
all selections, a background color can be set. For the primary, the secondary
and the find and replace selection, a text color can also be chosen.
72
for a particular selection kind can be grouped together as children of a setting group. As an immediate improvement, for an optional text selection, a
boolean setting can be used instead of a simple group.
As an example, lets take the secondary selection. This text selection kind
is optional and one can set a background and a text color for it. Corresponding preferences are declared as instance variables of ThemeSettings. Their values can be read and changed from the current theme by getting its associated
ThemeSettings instance. Thus, the two color settings can be declared as children of the #useSecondarySelection boolean setting as given below:
(aBuilder setting: #useSecondarySelection)
target: UITheme;
targetSelector: #currentSettings;
label: 'Use the secondary selection' translated;
with: [
(aBuilder setting: #secondarySelectionColor)
label: 'Secondary selection color' translated.
(aBuilder setting: #secondarySelectionTextColor)
label: 'Secondary selection text color' translated].
The Figure 5.12 shows these setting declarations in the Settings Browser. The
look and feel is clean but in fact two observations can be made:
1. it takes three lines for each selection kind. This is a little bit uncomfortable because the view for one selection takes a lot of vertical space,
2. the underlying model is not explicitly designed. The settings for one
selection kind are grouped together in the Settings Browser, but corresponding preference values are declared as separated instance variables of ThemeSettings. In the next section we see how to improve this
first solution with a better design.
Figure 5.12: The secondary selection settings declared with basic setting values
73
Here, you can notice that the preference is declared as optional and with no
text color.
For these preferences to be changeable from the Settings Browser, we have
to declare two methods. The first one is for the setting declaration and the
second is to implement the view.
The setting declaration is implemented as follow:
TextSelectionPreference class>>selectionPreferenceOn: aBuilder
74
As you can see, there is absolutely nothing new in this declaration. The only
thing that changes is that the value of the preferences are of a user defined
class. In fact, in case of user defined or application specific preference class,
the only particular thing to do is to implement one supplementary method
for the view. This method must be named settingInputWidgetForNode: and must
be implemented as a class method.
The method settingInputWidgetForNode: responsibility is to build the input
widget for the Settings Browser. This method takes a SettingDeclaration as argument. SettingDeclaration is basically a model and its instances are managed by
the Settings Browser.
Each SettingDeclaration instance serves as a preference value holder. Indeed, each setting that you can view in the Settings Browser is internally represented by a SettingDeclaration instance.
For each of our text selection preferences, we want to be able to change
their colors and if the selection is optional, have the possibility to enable or
disable their. Regarding the colors, depending on the selection preference
value, only the background color is always shown. Indeed, if the text color
of the preference value is nil, this means that having a text color does not
make sense and then the corresponding color chooser is not built.
The settingInputWidgetForNode: method can be implemented as below:
TextSelectionPreference class>>settingInputWidgetForNode: aSettingDeclaration
| preferenceValue backColorUI usedUI uiElements |
preferenceValue := aSettingDeclaration preferenceValue.
usedUI := self usedCheckboxForPreference: preferenceValue.
backColorUI := self backgroundColorChooserForPreference: preferenceValue.
uiElements := {usedUI. backColorUI},
(preferenceValue textColor
ifNotNil: [ { self textColorChooserForPreference: preferenceValue } ]
ifNil: [{}]).
^ (self theme newRowIn: self world for: uiElements)
cellInset: 20;
Chapter summary
75
yourself
This method simply adds some basic elements in a row and returns the
row. First, you can notice that the actual preference value, an instance of
TextSelectionPreference, is obtained from the SettingDeclaration instance by sending #preferenceValue to it. Then, the user interface elements can be built based
on the actual TextSelectionPreference instance.
The first element is a checkbox or an empty space returned by the #usedCheckboxForPreference: invocation. This method is implemented as follow:
TextSelectionPreference class>>usedCheckboxForPreference: aSelectionPreference
^ aSelectionPreference optional
ifTrue: [self theme
newCheckboxIn: self world
for: aSelectionPreference
getSelected: #used
setSelected: #used:
getEnabled: #optional
label: ''
help: 'Enable or disable the selection']
ifFalse: [Morph new height: 1;
width: 30;
color: Color transparent]
The next elements are two color choosers. As an example, the background
color chooser is built as follows:
TextSelectionPreference class>>backgroundColorChooserForPreference:
aSelectionPreference
^ self theme
newColorChooserIn: self world
for: aSelectionPreference
getColor: #backgroundColor
setColor: #backgroundColor:
getEnabled: #used
help: 'Background color' translated
Now, in the Settings Browser, the user interface looks as shown in Figure 5.13,
with only one line for each selection kind instead of three as in our previous
version.
5.9
Chapter summary
We presented Settings, a new framework to manage preferences in a modular way. The key point of Settings is that it supports a modular flow of
control: a package is responsible to define customization points and can use
76
Figure 5.13: The text selection settings implemented with a specific preference class
them locally, then using Settings it is possible to describe such customization points. Finally, the Settings Browser collects such setting descriptions
and presents them to the user. The flow is then from the Settings Browser to
the customized packages.
Chapter 6
true
true
true
false
false
78
6.1
Our job is to write a simple application that will generate a site map for a
web site that we have stored locally on our hard drive. The site map will
contain links to each of the HTML files in the web site, using the title of
the document as the text of the link. Furthermore, links will be indented to
reflect the directory structure of the web site.
The last method opens a browser to select the directory to open. Now, if
you inspect the result of WebDir selectHome, you will be prompted for giving
the directory containing your web pages, and you will be able to verify that
2 The
79
webDir and homePath are properly initialized to the directory holding your
web site and the full path name of this directory.
true
The * (known as the Kleene star, after Stephen Kleene, who invented it)
is a regex operator that will match the preceding regex any number of times
(including zero).
'' matchesRegex: 'x*'
'x' matchesRegex: 'x*'
'xx' matchesRegex: 'x*'
'y' matchesRegex: 'x*'
true
true
true
false
true
true
true
false
Now lets check our regex to see if HTML files work as expected.
80
true
true
false
false
If you send htmlFiles to a WebDir instance and print it , you should see something like this:
(WebDir onPath: '...') htmlFiles
#('index.html' ...)
Now listing the HTML files should work just as it did before, except that
we reuse the same regex object many times.
81
correctly generate links from the root of the web site to the files it contains.) Define
an initialization method on the instance side and a creation method on the class side.
WebPage>>initializePath: filePath homePath: dirPath
path := filePath.
homePath := dirPath
WebPage class>>on: filePath forHome: homePath
^ self new initializePath: filePath homePath: homePath
A WebDir instance should be able to return a list of all the web pages it
contains.
Add the following method to WebDir, and inspect the return value to verify that
it works correctly.
WebDir>>webPages
^ self htmlFiles collect:
[ :each | WebPage
on: webDir fullName, '/', each
forHome: homePath ]
String substitutions
Thats not very informative, so lets use a regex to get the actual file name
for each web page. To do this, we want to strip away all the characters from
the path name up to the last directory. On a Unix file system directories end
with a slash (/), so we need to delete everything up to the last slash in the file
path.
The String extension method copyWithRegex:matchesReplacedWith: does what
we want:
'hello' copyWithRegex: '[elo]+' matchesReplacedWith: 'i'
'hi'
In this example the regex [elo] matches any of the characters e, l or o. The
operator + is like the Kleene star, but it matches exactly one or more instances
of the regex preceding it. Here it will match the entire substring 'ello' and
replay it in a fresh string with the letter i.
Add the following method and verify that it works as expected.
82
WebPage>>fileName
^ path copyWithRegex: '.*/' matchesReplacedWith: ''
Now you should see something like this on your test web site:
(WebDir onPath: '...') webPages collect: [:each | each fileName ]
#('index.html' ...)
Actually, you might have problems if your web pages contain non-ascii
characters, in which case you might be better off with the following code:
WebPage>>contents
^ (FileStream oldFileOrNoneNamed: path)
converter: Latin1TextConverter new;
contents
'<head>
Now lets extract the title. In this case we are looking for the text that
occurs between the HTML tags <title> and </title>.
What we need is a way to extract part of the match of a regular expression.
Subexpressions of regexes are delimited by parentheses. Consider the regex
([aeiou]+)([aeiou]+). It consists of two subexpressions, the first of which will
match a sequence of one or more non-vowels, and the second of which will
match one or more vowels. (The operator at the start of a bracketed set of
characters negates the set. 3 )
3 NB: In Pharo the caret is also the return keyword, which we write as ^. To avoid confusion, we will write when we are using the caret within regular expressions to negate sets of
characters, but you should not forget, they are actually the same thing.
83
Now we will try to match a prefix of the string 'pharo' and extract the submatches:
re := '([aeiou]+)([aeiou]+)' asRegex.
re matchesPrefix: 'pharo' true
re subexpression: 1
'pha'
re subexpression: 2
'ph'
re subexpression: 3
'a'
After successfully matching a regex against a string, you can always send
it the message subexpression: 1 to extract the entire match. You can also send
subexpression: n where n 1 is the number of subexpressions in the regex.
The regex above has two subexpressions, numbered 2 and 3.
We will use the same trick to extract the title from an HTML file.
Define the following method:
WebPage>>title
| re |
re := '[\w\W]*<title>(.*)</title>' asRegexIgnoringCase.
^ (re matchesPrefix: self contents)
ifTrue: [ re subexpression: 2 ]
ifFalse: [ '(', self fileName, ' -- untitled)' ]
As HTML does not care whether tags are upper or lower case, so we must
make our regex case insensitive by instantiating it with asRegexIgnoringCase.
Now we can test our title extractor, and we should see something like
this:
(WebDir onPath: '...') webPages first title
'Home page'
84
The first result would give us an absolute path, which is probably not
what we want.
Define the following methods:
WebPage>>relativePath
^ path
copyWithRegex: homePath , '/'
matchesReplacedWith: ''
WebPage>>link
^ '<a href="', self relativePath, '">', self title, '</a>'
We need to generate HTML bullet lists containing links for each web page
of a web directory. Subdirectories should be indented in their own bullet
list.
WebDir>>printTocOn: aStream
self htmlFiles
ifNotEmpty: [
aStream nextPutAll: '<ul>'; cr.
self webPages
do: [:each | aStream nextPutAll: '<li>';
nextPutAll: each link;
nextPutAll: '</li>'; cr].
self webDirs
do: [:each | each printTocOn: aStream].
aStream nextPutAll: '</ul>'; cr]
We create a file called toc.html in the root web directory and dump the
site map there.
Regex syntax
85
WebDir>>tocFileName
^ 'toc.html'
WebDir>>makeToc
| tocStream |
tocStream := (webDir / self tocFileName) writeStream.
self printTocOn: tocStream.
tocStream close.
6.2
Regex syntax
We will now have a closer look at the syntax of regular expressions as supported by the Regex package.
The simplest regular expression is a single character. It matches exactly
that character. A sequence of characters matches a string with exactly the
same sequence of characters:
'a' matchesRegex: 'a'
'foobar' matchesRegex: 'foobar'
'blorple' matchesRegex: 'foobar'
true
true
false
86
We have already seen the Kleene star (*) and the + operator. A regular
expression followed by an asterisk matches any number (including 0) of
matches of the original expression. For example:
'ab' matchesRegex: 'a*b'
'aaaaab' matchesRegex: 'a*b'
'b' matchesRegex: 'a*b'
'aac' matchesRegex: 'a*b'
true
true
true
false
The Kleene star has higher precedence than sequencing. A star applies to
the shortest possible subexpression that precedes it. For example, ab* means
a followed by zero or more occurrences of b, not zero or more occurrences
of ab:
true
false
true
false
true
false "need at least one b"
true
false "too many b's"
false
true
true
The last operator is |, which expresses choice between two subexpressions. It matches a string if either of the two subexpressions matches the
string. It has the lowest precedence even lower than sequencing. For example, ab*|ba* means a followed by any number of bs, or b followed by any
number of as:
Regex syntax
87
true
true
false
A bit more complex example is the expression c(a|d)+r, which matches the
name of any of the Lisp-style car, cdr, caar, cadr, ... functions:
'car' matchesRegex: 'c(a|d)+r'
'cdr' matchesRegex: 'c(a|d)+r'
'cadr' matchesRegex: 'c(a|d)+r'
true
true
true
true
false
false "a set matches only one character"
Using plus operator, we can build the following binary number recognizer:
'10010100' matchesRegex: '[01]+'
'10001210' matchesRegex: '[01]+'
true
false
If the first character after the opening bracket is , the set is inverted: it
matches any single character not appearing between the brackets:
'0' matchesRegex: '[01]'
'3' matchesRegex: '[01]'
false
true
88
Syntax
What it represents
a
.
( )
\
*
+
?
|
[abcd]
[ abcd]
[0-9]
\w
\W
\d
\D
\s
\S
match alphanumeric
match non-alphanumeric
match digit
match non-digit
match space
match non-space
Table 6.1: Regex Syntax in a Nutshell
Character classes
Regular expressions can also include the following backquote escapes to refer to popular classes of characters: \w to match alphanumeric characters, \d
to match digits, and \s to match whitespace. Their upper-case variants, \W, \D
and \S, match the complementary characters (non-alphanumerics, non-digits
and non-whitespace). Table 6.1 gives a summary of the syntax seen so far.
As mentioned in the introduction, regular expressions are especially useful for validating user input, and character classes turn out to be especially
useful for defining such regexes. For example, non-negative numbers can be
matched with the regex d+:
'42' matchesRegex: '\d+'
'-1' matchesRegex: '\d+'
true
false
Better yet, we might want to specify that non-zero numbers should not
start with the digit 0:
'0' matchesRegex: '0|([1-9]\d*)'
'1' matchesRegex: '0|([1-9]\d*)'
'42' matchesRegex: '0|([1-9]\d*)'
'099' matchesRegex: '0|([1-9]\d*)'
true
true
true
false
"leading 0"
Regex syntax
89
true
true
true
true
false "negative zero"
false "leading zero"
Floating point numbers should require at least one digit after the dot:
'0' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?'
'0.9' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?'
'3.14' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?'
'-42' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?'
'2.' matchesRegex: '(0|((\+|-)?[1-9]\d*))(\.\d+)?'
true
true
true
true
false
true
What it represents
[:alnum:]
[:alpha:]
[:cntrl:]
[:digit:]
[:graph:]
[:lower:]
[:print:]
[:punct:]
[:space:]
[:upper:]
[:xdigit:]
any alphanumeric
any alphabetic character
any control character (ascii code is < 32)
any decimal digit
any graphical character (ascii code >= 32)
any lowercase character
any printable character (here, the same as [:graph:])
any punctuation character
any whitespace character
any uppercase character
any hexadecimal character
Table 6.2: Regex character classes
Note that these elements are components of the character classes, i.e., they
have to be enclosed in an extra set of square brackets to form a valid regular
expression. For example, a non-empty string of digits would be represented
as [[:digit:]]+. The above primitive expressions and operators are common to
many implementations of regular expressions.
'42' matchesRegex: '[[:digit:]]+'
true
90
true
true
true
true
true
Matching boundaries
The last group of special primitive expressions is shown in Table 6.3, and is
used to match boundaries of strings.
Syntax
What it represents
$
\b
\B
\<
\>
6.3
Regex API
Regex API
91
regexes.
false
true
true
false
true
Enumeration interface
Some applications need to access all matches of a certain regular expression
within a string. The matches are accessible using a protocol modeled after
the familiar Collection-like enumeration protocol.
regex:matchesDo: evaluates a one-argument aBlock for every match of the
regular expression within the receiver string.
list := OrderedCollection new.
'Jack meet Jill' regex: '\w+' matchesDo: [:word | list add: word].
list an OrderedCollection('Jack' 'meet' 'Jill')
regex:matchesCollect: evaluates a one-argument aBlock for every match of
the regular expression within the receiver string. It then collects the results
and answers them as a SequenceableCollection.
'Jack meet Jill' regex: '\w+' matchesCollect: [:word | word size]
an OrderedCollection(4 4 4)
allRegexMatches: returns a collection of all matches (substrings of the receiver string) of the regular expression.
'Jack and Jill went up the hill' allRegexMatches: '\w+'
an OrderedCollection('Jack' 'and' 'Jill' 'went' 'up' 'the' 'hill')
92
All messages of enumeration and replacement protocols perform a casesensitive match. Case-insensitive versions are not provided as part of a String
protocol. Instead, they are accessible using the lower-level matching interface presented in the following question.
Lower-level interface
When you send the message matchesRegex: to a string, the following happens:
1. A fresh instance of RxParser is created, and the regular expression string
is passed to it, yielding the expressions syntax tree.
2. The syntax tree is passed as an initialization parameter to an instance
of RxMatcher. The instance sets up some data structure that will work
as a recognizer for the regular expression described by the tree.
3. The original string is passed to the matcher, and the matcher checks for
a match.
The Matcher
If you repeatedly match a number of strings against the same regular expression using one of the messages defined in String, the regular expression string
is parsed and a new matcher is created for every match. You can avoid this
overhead by building a matcher for the regular expression, and then reusing
the matcher over and over again. You can, for example, create a matcher at a
class or instance initialization stage, and store it in a variable for future use.
You can create a matcher using one of the following methods:
You can send asRegex or asRegexIgnoringCase to the string.
Regex API
93
You can directly instantiate a RxMatcher using one of its class methods:
forString: or forString:ignoreCase: (which is what the convenience methods
above will do).
Here we send matchesIn: to collect all the matches found in a string:
octal := '8r[0-9A-F]+' asRegex.
octal matchesIn: '8r52 = 16r2A'
an OrderedCollection('8r52')
Matching
A matcher understands these messages (all of them return true to indicate
successful match or search, and false otherwise):
matches: aString true if the whole argument string (aString) matches.
'\w+' asRegex matches: 'Krazy'
true
matchesPrefix: aString true if some prefix of the argument string (not necessarily the whole string) matches.
'\w+' asRegex matchesPrefix: 'Ignatz hates Krazy'
true
search: aString Search the string for the first occurrence of a matching
substring. (Note that the first two methods only try matching from the very
beginning of the string). Using the above example with a matcher for a+, this
method would answer success given a string 'baaa', while the previous two
would fail.
'\b[a-z]+\b' asRegex search: 'Ignatz hates Krazy'
true
"finds 'hates'"
The matcher also stores the outcome of the last match attempt and can
report it: lastResult answers a Boolean: the outcome of the most recent match
attempt. If no matches were attempted, the answer is unspecified.
number := '\d+' asRegex.
number search: 'Ignatz throws 5 bricks'.
number lastResult true
matchesStream:, matchesStreamPrefix: and searchStream: are analogous to the
above three messages, but takes streams as their argument.
94
Subexpression matches
After a successful match attempt, you can query which part of the original
string has matched which part of the regex. A subexpression is a parenthesized part of a regular expression, or the whole expression. When a regular
expression is compiled, its subexpressions are assigned indices starting from
1, depth-first, left-to-right.
For example, the regex ((\\d+)\\s*(\\w+)) has four subexpressions, including
itself.
1:
2:
3:
4:
((\d+)\s*(\w+))
(\d+)\s*(\w+)
\d+
\w+
The highest valid index is equal to 1 plus the number of matching parentheses. (So, 1 is always a valid index, even if there are no parenthesized
subexpressions.)
After a successful match, the matcher can report what part of the original
string matched what subexpression. It understands these messages:
subexpressionCount answers the total number of subexpressions: the highest value that can be used as a subexpression index with this matcher. This
value is available immediately after initialization and never changes.
subexpression: takes a valid index as its argument, and may be sent only
after a successful match attempt. The method answers a substring of the
original string the corresponding subexpression has matched to.
subBeginning: and subEnd: answer the positions within the argument string
or stream where the given subexpression match has started and ended, respectively.
items := '((\d+)\s*(\w+))' asRegex.
items search: 'Ignatz throws 1 brick at Krazy'.
items subexpressionCount 4
items subexpression: 1
'1 brick' "complete expression"
items subexpression: 2
'1 brick' "top subexpression"
items subexpression: 3
'1'
"first leaf subexpression"
items subexpression: 4
'brick'
"second leaf subexpression"
items subBeginning: 3
an OrderedCollection(14)
items subEnd: 3
an OrderedCollection(15)
Regex API
items subBeginning: 4
items subEnd: 4
95
an OrderedCollection(16)
an OrderedCollection(21)
As a more elaborate example, the following piece of code uses a MMM DD,
YYYY date format recognizer to convert a date to a three-element array with
year, month, and day strings:
date := '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(\d\d?)\s*,\s*19(\d\d)'
asRegex.
result := (date matches: 'Aug 6, 1996')
ifTrue: [{ (date subexpression: 4) .
(date subexpression: 2) .
(date subexpression: 3) } ]
ifFalse: ['no match'].
result #('96' 'Aug' '6')
There are also the following methods for iterating over matches within
streams: matchesOnStream:, matchesOnStream:do:, matchesOnStream:collect:,
copyStream:to:replacingMatchesWith: and copyStream:to:translatingMatchesUsing:.
in := ReadStream on: '12 drummers, 11 pipers, 10 lords, 9 ladies, etc.'.
out := WriteStream on: ''.
numMatch := '\<\d+\>' asRegex.
numMatch
copyStream: in
to: out
translatingMatchesUsing: [:each | each asNumber asFloat asString ].
out close; contents '12.0 drummers, 11.0 pipers, 10.0 lords, 9.0 ladies, etc.'
96
Error Handling
Several exceptions may be raised by RxParser when building regexes. The
exceptions have the common parent RegexError. You may use the usual
Smalltalk exception handling mechanism to catch and handle them.
RegexSyntaxError is raised if a syntax error is detected while parsing a
regex
RegexCompilationError is raised if an error is detected while building a
matcher
RegexMatchingError is raised if an error occurs while matching (for example, if a bad selector was specified using ':<selector>:' syntax, or because
of the matchers internal error)
['+' asRegex] on: RegexError do: [:ex | ^ ex printString ]
'RegexSyntaxError: nullable closure'
6.4
Chapter summary
97
Acknowledgments. Since the first release of the matcher, thanks to the input from several fellow Smalltalkers, I became convinced a native Smalltalk
regular expression matcher was worth the effort to keep it alive. For the
advice and encouragement that made this release possible, I want to thank:
Felix Hack, Eliot Miranda, Robb Shecter, David N. Smith, Francis Wolinski
and anyone whom I havent yet met or heard from, but who agrees this has
not been a complete waste of time.
6.5
Chapter summary
Part II
Source Management
Chapter 7
102
text, are the units of change. In this chapter we will use SmalltalkHub, but
Squeaksource 3 can be use similarly. SmalltalkHub is a central online repository in which you can store versions of your applications using Monticello.
SmalltalkHub is the equivalent of SourceForge, and Monticello the equivalent of CVS.
In this chapter, you will learn how to use use Monticello and
SmalltalkHub to manage your software. We have already been acquainted
with Monticello briefly in earlier chapters6 . This chapter delves into the details of Monticello and describes some additional features that are useful for
versioning large applications.
7.1
Basic usage
Of course these tests will fail as we have not yet implemented the isPerfect
method for integers. We would like to put this code under the control of
Monticello as we revise and extend it.
Launching Monticello
Monticello is included in the standard Pharo distribution. Monticello Browser
can be selected from the World menu. In Figure 7.1, we see that the Monticello Browser consists of two list panes and one button pane. The left pane
6 A
Basic usage
103
lists installed packages and the right panes shows known repositories. Various operations may be performed via the button pane and the menus of the
two list panes.
Creating a package
Monticello manages versions of packages. A package is essentially a named
set of classes and methods. In fact, a package is an object an instance of
PackageInfo that knows how to identify the classes and methods that belong to it.
We would like to version our PerfectTest class. The right way to do this
is to define a package called Perfect containing PerfectTest and all the related classes and methods we will introduce later. For the moment, no such
package exists. We only have a category called (not coincidentally) Perfect.
This is perfect, since Monticello will map categories to packages for us.
Press the +Package in the Monticello browser and enter Perfect.
Voil! You have just created the Perfect Monticello package.
Monticello packages follow a number of important naming conventions
for class and method categories. Our new package named Perfect contains:
All classes in the category Perfect , or in categories whose names start
with Perfect-. For now this includes only our PerfectTest class.
All methods belonging to any class (in any category) that are defined in
a protocol named *perfect or *Perfect , or in protocols whose names start
with *perfect- or *Perfect-. Such methods are known as extensions. We
dont have any yet, but we will define some very soon.
All methods belonging to any classes in the category Perfect , or in categories whose names begin with Perfect-, except those in protocols whose
104
Committing changes
Note in Figure 7.2 that the Save button is disabled (greyed out).
Before we save our Perfect package, we need to specify to where we want
to save it. A repository is a package container, which may either be local to
your machine or remote (accessed over the network). Various protocols may
be used to establish a connection between your Pharo image and a repository. As we will see later (Section 7.5), Monticello supports a large choice of
repositories, though the most commonly used is HTTP, since this is the one
used by SmalltalkHub.
At least one repository, called package-cache, is set up by default, and
is shown as the first entry in the list of repositories on the right-hand side
of your Monticello browser (see Figure 7.1). The package-cache is created
automatically in the local directory where your Pharo image is located. It
will contain a copy of all the packages you download from remote repositories. By default, copies of your packages are also saved in the package-cache
when you save them to a remote server.
Each package knows which repositories it can be saved to. To add a new
repository to the selected package, press the +Repository button. This will
offer a number of choices of different kind of repository, including HTTP.
For the rest of the chapter we will work with the package-cache repository, as
this is all we need to explore the features of Monticello.
Select the directory repository named package cache, press Save , enter an appropriate log message, and Accept to save the changes.
Basic usage
105
Figure 7.3: You may set a new version name and a commit message when
you save a version of a package.
The Perfect package is now saved in package-cache, which is nothing more
than a directory contained in the same directory as your Pharo image. Note,
however, that if you use any other kind or repository (e.g., HTTP, FTP, another local directory), a copy of your package will also be saved in the
package-cache.
Use your favorite file browser (e.g., Windows Explorer, Finder or XTerm) to
confirm that a file Perfect-XX.1.mcz was created in your package cache. XX corresponds to your name or initials.8
A version is an immutable snapshot of a package that has been written
to a repository. Each version has a unique version number to identify it in a
repository. Be aware, however, that this number is not globally unique in
another repository you might have the same file identifier for a different snapshot. For example, Perfect-onierstrasz.1.mcz in another repository might be the
final, deployed version of our project! When saving a version into a repository, the next available number is automatically assigned to the version, but
you can change this number if you wish. Note that version branches do
not interfere with the numbering scheme (as with CVS or Subversion). As
we shall see later, versions are by default ordered by their version number
when viewing a repository.
Class extensions
Lets implement the methods that will make our tests green.
Define the following two methods in the class Integer, and put each method in
a protocol called *perfect. Also add the new boundary tests. Check that the tests are
now green.
8 In the past, the convention was for developers to log their changes using only their initials.
Now, with many developers sharing identical initials, the convention is to use an identifier
based on the full name, such as apblack or AndrewBlack.
106
IntegerisPerfect
^ self > 1 and: [self divisors sum = self]
Integerdivisors
^ (1 to: self - 1 ) select: [ :each | (self rem: each) = 0 ]
PerfectTesttestPerfectBoundary
self assert: 0 isPerfect not.
self assert: 1 isPerfect not.
Basic usage
107
108
a version name displayed with a normal typeface shows an older version than the installed current one.
Action-clicking the right-hand side of the inspector opens a menu with
different sorting options. The unchanged entry in the menu discards any particular sorting. It uses the order given by the repository.
Basic usage
109
Branching
A branch is a line of development versions that exists independently of another line, yet still shares a common ancestor version if you look far enough
back in time.
You may create a new version branch when saving your package. Branching is useful when you want to have a new parallel development. For example, suppose your job is doing software maintenance in your company. One
day a different division asks you for the same software, but with a few parts
tweaked for them, since they do things slightly differently. The way to deal
with this situation is to create a second branch of your program that incorporate the tweaks, while leaving the first branch unmodified.
From the repository inspector, select version 1 of the Perfect package and Load
it. Version 2 should again be displayed in bold, indicating that it no longer loaded
(since it is not an ancestor of version 1). Now implement the following two Integer
methods and place them in the *perfect protocol, and also modify the existing
PerfectTest test method as follows:
110
IntegerisPerfect
self < 2 ifTrue: [ ^ false ].
^ self divisors sum = self
Integerdivisors
^ (1 to: self - 1 ) select: [ :each | (self \\ each) = 0]
PerfectTesttestPerfect
self assert: 2 isPerfect not.
self assert: 6 isPerfect.
self assert: 7 isPerfect not.
self assert: 28 isPerfect.
Once again the tests should be green, though our implementation of perfect numbers is slightly different.
Attempt to load version 2 of the Perfect package.
Now you should get a warning that you have unsaved changes.
Merging
You can merge one version of a package with another using the Merge button in the Monticello browser. Typically, you will want to do this when (i)
Basic usage
111
112
Figure 7.12: Version 2 of the Perfect package being merged with the current
version 3.
In Figure 7.12 we see the three differences between versions 2 and 3 of
113
7.2
Monticello has many other useful features. As we can see in Figure 7.1, the
Monticello browser window has eight buttons. We have already used four of
them +Package , Save , +Repository and Open . We will now look at Browse
and Changes which are used to explore the state and history of repositories
114
Figure 7.13: All older versions are now ancestors of merged version 4.
Browse
The Browse button opens a snapshot browser to display the contents of
a package. The advantage of the snapshot browser over the browser is its
ability to display class extensions.
Select the Perfect package and click the Browse button.
Figure 7.14: The snapshot browser reveals that the Perfect package extends
the class Integer with 2 methods.
For example, Figure 7.14 shows the class extensions defined in the Perfect
package. Note that code cannot be edited here, though by action-clicking, if
your environment has been set up accordingly) on a class or a method name
you can open a regular browser.
Advanced topics
115
Changes
The Changes button computes the difference between the code in the image
and the most recent version of the package in the repository.
Make the following changes to PerfectTest, and then click the Changes button
in the Monticello browser.
PerfectTesttestPerfect
self assert: 2 isPerfect not.
self assert: 6 isPerfect.
self assert: 7 isPerfect not.
self assert: 496 isPerfect.
PerfectTesttestPerfectTo1000
self assert: ((1 to: 1000) select: [:each | each isPerfect]) = #(6 28 496)
Figure 7.15: The patch browser shows the difference between the code in the
image and the most recently committed version.
Figure 7.15 shows that the Perfect package has been locally modified with
one changed method and one new method. As usual, action-clicking on a
change offers you a choice of contextual operations.
7.3
Advanced topics
Now we will have a look at several advanced topics, including history, managing dependencies, making configuration, and class initialization.
116
History
By action-clicking on a package, you can select the item History . It opens
a version history viewer that displays the comments committed along with
each version of the selected package (see Figure 7.16). The versions of the
package, in this case Perfect, are listed on the left, while information about
the selected version is displayed on the right.
Select the Perfect package, right click and select the History item.
Figure 7.16: The version history viewer provides information about the various versions of a package.
By action-clicking on a particular version, you can explore the changes
with respect to the current working copy of the package loaded in the image,
or spawn a new history browser relative to the selected version.
Dependencies
Most applications cannot live on their own and typically require the presence of other packages in order to work properly. For example, let us have a
look at Pier9 , a meta-described content management system. Pier is a large
piece of software with many facets (tools, documentations, blog, catch strategies, security, etc). Each facet is implemented by a separate package. Most
Pier packages cannot be used in isolation since they refer to methods and
classes defined in other packages. Monticello provides a dependency mechanism for declaring the required packages of a given package to ensure that it
will be correctly loaded.
Essentially, the dependency mechanism ensures that all required packages of a package are loaded before the package is loaded itself. Since required packages may themselves require other packages, the process is applied recursively to a tree of dependencies, ensuring that the leaves of the
tree are loaded before any branches that depend on them. Whenever new
9 http://source.lukas-renggli.ch/pier
Advanced topics
117
versions of required packages are checked in, then new versions of the packages that depend on them will automatically depend on the new versions.
Dependencies cannot be expressed across repositories. All requiring and required packages must live in the same
repository.
Figure 7.17 illustrates how this works in Pier. Package Pier-All is an empty
package that acts as a kind of umbrella. It requires Pier-Blog, Pier-Caching and
all the other Pier packages.
118
Advanced topics
119
To commit the change, you should save NewPerfect-All. This will commit a new version of NewPerfect-All which then requires the new version
of NewPerfect-Tests. (It will also depend on the existing, unmodified version of NewPerfect-Extensions.) Loading the latest version of NewPerfectAll will also load the latest version of the required packages.
If instead you save NewPerfect-Tests, this will not cause NewPerfect-All to
be saved. This is bad because you effectively break the dependency. If
you then load the latest version of NewPerfect-All you will not get the
latest versions of the required packages. Dont do it!
Do not name your top level package with a suffix (e.g.,
Perfect) that could match your subpackages. Do not define Perfect as a required package of Perfect-Extensions or
PerfectTest. You would run into in trouble as Monticello
would save all the classes for three packages, though you
only want two packages and an empty one at the top
level.
To build more flexible dependencies between packages, we recommend
using a Metacello configuration (see Chapter 9). The +Config button creates
a kind of configuration structure. The only thing to do is to add the dependencies.
Class initialization
When Monticello loads a package into the image, any class that defines an
initialize method on the class side will be sent the initialize message. The message is sent only to classes that define this method on the class side. A class
that does not define this method will not be initialized, even if initialize is defined by one of its superclasses. NB: the initialize method is not invoked
when you merely reload a package!
Class initialization can be used to perform any number of checks or special actions. A particularly useful application is to add new instance variables to a class.
Class extensions are strictly limited to adding new methods to a class.
Sometimes, however, extension methods may need new instance variables
to exist.
Suppose, for example, that we want to extend the TestCase class of SUnit
with methods to keep track of the history of the last time the test was red.
We would need to store that information somewhere, but unfortunately we
cannot define instance variables as part of our extension.
120
When our package is loaded, this code will be evaluated and the instance
variable will be added, if it does not already exist. Note that if you change
a class that is not in your package, the other package will become dirty. In
the previous example, the package SUnit contains TestCase. After installing
TestCaseExtension, the package SUnit will become dirty.
7.4
3. Load version 2
4. In the change sorter, you should now see the difference between version 1 and 2. The change set may be saved on the filesystem by actionclicking on it and selecting file out . A DiffPerfect.X.cs file is now located
next to your Pharo image.
Kinds of repositories
7.5
121
Kinds of repositories
Several kinds of repositories are supported by Monticello, each with different characteristics and uses. Repositories can be read-only, write-only or
read-write. Access rights may be defined globally or can be tied to a particular user (as in SmalltalkHub, for example).
HTTP. HTTP repositories are probably the most popular kind of repository
since this is the kind supported by SmalltalkHub.
The nice thing about HTTP repositories is that it is easy to link directly
to specific versions from web sites. With a little configuration work on the
HTTP server, HTTP repositories can be made browsable by ordinary web
browsers, WebDAV clients, and so on.
HTTP repositories may be used with an HTTP server other than
SmalltalkHub. For example, a simple configuration10 turns Apache into a
Monticello repository with restricted access rights:
"My apache2 install worked as a Monticello repository right out of the box on my
RedHat 7.2 server. For posterity's sake, here's all I had to add to my apache2 config:"
Alias /monticello/ /var/monticello/
<Directory /var/monticello>
DAV on
Options indexes
Order allow,deny
Allow from all
AllowOverride None
# Limit write permission to list of valid users.
<LimitExcept GET PROPFIND OPTIONS REPORT>
AuthName "Authorization Realm"
AuthUserFile /etc/monticello-auth
AuthType Basic
Require valid-user
</LimitExcept>
</Directory>
"This gives a world-readable, authorized-user-writable Monticello repository in
/var/monticello. I created /etc/monticello-auth with htpasswd and off I went.
I love Monticello and look forward to future improvements."
FTP. This is similar to an HTTP repository, except that it uses an FTP server
instead. An FTP server may also offer restricted access right and different
FTP clients may be used to browse such a Monticello repository.
10 http://www.visoracle.com/squeak/faq/monticello-1.html
122
Kinds of repositories
123
(path asFileReference).
MCRepositoryGroup default addRepository: repo ].
Using SmalltalkHub
SmalltalkHub is a online repository that you can use to store your Monticello
packages. An instance is running and accessible from http://smalltalkhub.com/.
124
Add this repository to Monticello by clicking +Repository , and then selecting HTTP .
Fill out the template with the URL corresponding to the project you can copy the
above repository expression from the web page and paste it into the template. Since
you are not going to commit new versions of this package, you do not need to fill in
the user and password. Open the repository, select the latest version of Phexample
and click Load .
Pressing the Join link on the SmalltalkHub home page will probably be
your first step if you do not have a SmalltalkHub account. Once you are a
member, + New Project allows you to create a new project.
7.6
Versions are stored in repositories as binary files. These files are commonly
call mcz files as they carry the extension .mcz. This stands for Monticello
125
zip since an mcz file is simply a zipped file containing the source code and
other meta-data.
An mcz file can be dragged and dropped onto an open
image file, just like a change set. Pharo will then prompt
you to ask if you want to load the package it contains.
Monticello will not know which repository the package
came from, however, so do not use this technique for development.
You may try to unzip such a file, for example to view the source code
directly, but normally, end users should not need to unzip these files themselves. If you unzip it, you will find the following members of the mcz file.
File contents Mcz files are actually ZIP archives that follow certain conventions. Conceptually a version contains four things:
Package. A version is related to a particular package. Each mcz file
contains a file called package that contains information about the
packages name.
VersionInfo. This is the meta-data about the snapshot. It contains the
author initials, date and time the snapshot was taken, and the ancestry of the snapshot. Each mcz file contains a member called version
which contains this information.
A version doesnt contain a full history of the source code. Its a snapshot of the code at a single point in time, with a UUID identifying that
snapshot, and a record of the UUIDs of all the previous snapshots its
descended from.
Snapshot. A Snapshot is a record of the state of the package at a particular time. Each mcz file contains a directory named snapshot/. All
the members in this directory contain definitions of program elements,
which when combined, form the Snapshot. Current versions of Monticello only create one member in this directory, called source.st.
Dependencies. A version may depend on specific version of other packages. An mcz file may contain a dependencies/ directory with a
member for each dependency. These members will be named after
each package the Monticello package depends upon. For example, a
Pier-All mcz file will contain files named Pier-Blog and Pier-Caching in
its dependencies directory.
126
It basically says that the version AA-ab.3 has an empty log message, was
created on January 10, 2008, by ab, and has an ancestor named AA-ab.2, ...
7.7
Chapter summary
This chapter has presented the functionality of Monticello in detail. The following points were covered:
Monticello are mapped to Smalltalk categories and method protocols.
If you add a package called Foo to Monticello, it will include all classes
in categories called Foo or starting with Foo-. It will also include all
methods in those categories, except those in protocols starting with *.
Finally, it will include all class extension methods in protocols called *foo
or starting with *foo- anywhere else in the system.
When you modify any methods or classes in a package, it will be
marked as dirty in Monticello, and can be saved to a repository.
There are many kinds of repositories, the most popular being HTTP
repositories, such as those hosted by SmalltalkHub.
Saved packages are caches locally in a directory called package-cache.
The Monticello repository inspector can be used to browse a repository.
You can select which versions of packages to load or unload.
You can create a new branch of a package by basing a new version on
another version which is earlier than the latest version. The repository inspector keeps track of the ancestry of packages and can tell you
which versions belong to separate branches.
Branches can be merged. Monticello offers a fine degree of control over
the resolution of conflicts between merged versions. The merged version will have as its ancestor the two versions from which it merged.
Chapter summary
127
Chapter 8
8.1
130
Figure 8.1: The browser shows that the class String gets the methods asUrl and
asUrlRelativeTo: from the package network-url
classVariableNames: ''
poolDictionaries: ''
category: 'Zork'
131
Figure 8.2: The change browser shows that the method String>>asUrl has
changed.
HTTP server which allows us to save projects (particularly packages) managed by Monticello. This is the equivalent of a forge: It provides the management of contributors and their status, visibility information, a wiki with RSS
feed. A source open to everybody is available at http://www.smalltalkhub.com/.
132
Figure 8.3: (left) Typical setup with clean and dirty packages loaded and
cached (right) Package published.
8.2
What is Gofer?
Using Gofer
8.3
133
Using Gofer
Here is a typical Gofer script: it says that we want to load the package
PBE2GoferExample from the repository PBE2GoferExample that is available on
http://www.smalltalkhub.com in the account of JannikLaval.
Gofer new
url: 'http://smalltalkhub.com/mc/PharoBooks/GoferExample/main';
package: 'PBE2GoferExample';
load
When the repository (HTTP or FTP) requires an identification, the message url:username:password: is available. Pay close attention as this is a single
message, so do not put cascade in between. The message directory: supports
the access to local files.
Gofer new
url: 'http://smalltalkhub.com/mc/PharoBooks/GoferExample/main'
username: 'pharoUser'
password: 'pharoPwd';
package: 'PBE2GoferExample';
load.
"we work on the project PBE2GoferExample and provide credentials"
Gofer new
url: 'http://smalltalkhub.com/mc/PharoBooks/GoferExample/main/PBE2GoferExample'
username: 'pharoUser'
password: 'pharoPwd';
package: 'PBE2GoferExample';
"define the package to be loaded"
disablePackageCache;
"disable package lookup in local cache"
disableRepositoryErrors;
"stop the error raising"
load.
"load the package"
Since the same public servers are often used, Gofers API offers a number
of shortcuts to shorten the scripts. Often, we want to write a script and give
134
it to other people to load our code. In such a case having to specify a password is not really adequate. Here is an example for smalltalkHub (which has
some verbose urls such as http://smalltalkhub.com/mc/PharoBooks/GoferExample/
main for the project GoferExample). We use the smalltalkhubUser:project: message and just specify the minimal information. In this chapter, we also use
squeaksource3: as a shortcut for http://ss3.gemtalksystems.com/ss.
"Specifying a user but no password"
Gofer new
smalltalkhubUser: PharoBooks project: GoferExample;
package: 'PBE2GoferExample';
load
In addition, when Gofer does not succeed to load a package in a specified URL, it looks in the local cache which is normally at the root of your
image. It is possible to force Gofer not to use the cache using the message
disablePackageCache or to use it using the message enablePackageCache.
In a similar manner, Gofer returns an error when one of the repositories is not reachable. We can instruct it to ignore such errors using the message disableRepositoryErrors. To enable it the message we can use the message
enableRepositoryErrors.
Package Identification
Once an URL and the option are specified, we should define the packages
we want to load. Using the message version: defines the exact version to load,
while the message package: should be used to load the latest version available
in all the repositories.
The following example load the version 2 of the package.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
version: 'PBE2GoferExample-janniklaval.1';
load
We can also specify some constraints to identify packages using the message package: aString constraint: aBlock to pass a block.
For example the following code will load the latest version of the package
saved by the developer named janniklaval.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample'
constraint: [ :version | version author = 'janniklaval' ];
load
Gofer actions
8.4
135
Gofer actions
136
url: 'http://smalltalkhub.com/mc/JLaval/Phratch/main';
package: 'Collections-Arithmetic';
package: 'Sound';
package: 'Settings-Sound';
package: 'SoundScores';
package: 'SoundMorphicUserInterface';
package: 'Phratch';
load
Note that such scripts load the latest versions of the packages and are
therefore fragile, because if a new package version is published, you will
load it even if this is unstable. In general it is a good practice to control
the version of the external components we rely on and use the latest version
for our own current development. Now, such problem can be solved with
Metacello, the tool to express configurations and load them.
Other protocols
Gofer supports also FTP as well as loading from a local directory. We basically use the same messages as before, with some changes.
For FTP, we should specify the URL using 'ftp' as the heading.
Gofer new
url: 'ftp://wtf-is-ftp.com/code';
...
To work on a local directory, the message directory: followed by the absolute path of the directory should be used. Here we specify that the directory
Gofer actions
137
Finally it is possible to look for packages in a repository and all its subfolders using the keen star.
Gofer new
directory: '/home/pharoer/hacking/MCPackages/*';
...
Once a Gofer instance is parametrized, we can send it messages to perform different actions. Here is a list of the possible actions. Some of them are
described later.
load
Load the specified packages.
update
Update the package loaded versions.
merge
Merge the distant version with the one currently
loaded.
localChanges
Show the list of changes between the basis version
and the version currently modified.
remoteChanges
Show the changes between the version currently
modified and the version published on a server.
cleanup
Cleanup packages: Obsolete system information is
cleaned.
commit / commit: Save the packages to a distant server with a message log.
revert
Reload previously loaded packages.
recompile
Recompile packages
unload
Unload the packages from the image
fetch
Download the remote package versions from a remote server to the local cache.
push
Upload the versions from the local cache to the remote server.
138
Changes present in the working copy are merged with the code of the remote
copy. It is often the case that after a merge, the working copy gets dirty and
should be republished. The new version will contain the current changes
and the changes of the remote version. In case of conflicts the user will be
warned or else the operation will happen silently.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
merge
The message update loads the remote version in the image. The modifications of the working copy are lost.
The message revert resets the local version, i.e., it loads the current version
again. The changes of the working copy are then lost.
The commit and commit: operations. Once we have merged or changed a
package we want to save it. For this we can use the messages commit and
commit:. The second one is expecting a comment - in general this is a good
practice.
Gofer new
"We save the package in the repository"
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
"We comment the changes and save"
commit: 'I try to use the message commit: '
Gofer actions
139
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
"we add the latest version of PBE2GoferExample"
package: 'PBE2GoferExample';
"we browse the latest version published on the server"
browseRemoteChanges
The unload operation. The message unload unloads the packages from the
image. Note that using the Monticello browser you can delete a package, but
such an operation does not remove the code of the classes associated with
the package, it just destroys the package. Unloading a package destroys the
packages and the classes it contains.
The following code unloads the packages and its classes from the current
image.
Gofer new
package: 'PBE2GoferExample';
unload
Note that you cannot unload Gofer itself that way. Gofer gofer unload does
not work.
Since Monticello is a distributed versioning system, it is a good idea to save all the versions you want locally, without being
forced to published on a remote server. This is especially true when working
off-line. As it is tedious to synchronize all the local and remote published
packages, the messages fetch and push are there to support you in this task.
Fetch and push operations.
The message fetch copies the packages that are missing from the remote
server in your local server. The packages are not loaded in Pharo. After a
fetch you can load the packages even if the remote server breaks down.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
fetch
Now, if you want to load your packages locally remember to set up the
lookup so that it takes into account the local cache and disables errors as
presented in the beginning of this chapter (messages disableRepositoryErrors
and enablePackageCache).
The message push performs the inverse operation. It publishes locally
available packages to the remote server. All the packages that you published
locally are then pushed to the server.
140
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
push
As a pattern, we always keep the copies of all the versions of our projects
or the projects we used in our local cache. This way we are autonomous from
any network failure and the packages are backed up in our regular backup.
With these two messages, it is easy to write a script sync that synchronizes
local and remote repositories.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
push.
Gofer new
smalltalkhubUser: 'PharoBooks' project: 'GoferExample';
package: 'PBE2GoferExample';
fetch
Automating Answers
Sometimes package installation asks for information such as passwords.
With the systematic use of a build server, packages will probably stop to
do that, but it is important to know how to supply answers from within a
script to these questions. The message valueSupplyingAnswers: supports such a
task.
[ Gofer new
squeaksource: 'Seaside30';
package: 'LoadOrderTests';
load ]
141
valueSupplyingAnswers: {
{'Load Seaside'. True}.
{'SqueakSource User Name'. 'pharoUser'}.
{'SqueakSource Password'. 'pharoPwd'}.
{'Run tests'. false}.
}
This message should be sent to a block, giving a list of questions and their
answers as shown in previous examples
Configuration Loading
Gofer also supports Metacello configuration loading. It provides a set of
the following messages to handle configurations: configurationOf:, loadVersion:,
loadDevelopment, and loadStable.
In this example, loading the development version of NativeBoost. There
you need only to specify the NativeBoost project and you will load the
ConfigurationOfNativeBoost and execute the loading the development version.
Gofer new
smalltalkhubUser: 'Pharo' project: 'NativeBoost';
configuration;
loadDevelopment
When the repository name does not match the name of the configuration
you should use configurationOf: and provide the name of the configuration
class.
8.5
Gofer offers a nice facility to gather all the packages together in a given repository via the message allResolved.
Script 8.1: Getting the number of packages in a repository.
(Gofer new
smalltalkhubUser: 'Pharo' project: 'NativeBoost';
allResolved) size
The following script gathers the package versions by packages and returns a dictionary.
Script 8.2: Grouping versions by package names.
((Gofer new
smalltalkhubUser: 'Pharo' project: 'NativeBoost';
142
allResolved)
groupedBy: [ :each | each packageName])
Script 8.3: Getting the package list for the Kozen project hosted on SS3.
((Gofer new
squeaksource3: 'Kozen';
allResolved)
groupedBy: [ :each | each packageName]) keys
Fetching packages
Here is a script to fetch all the packages of a given repository. It is useful for
grabbing your files and having a version locally.
Script 8.4: Fetching all the packages of a repository
| go |
go := Gofer new squeaksource3: 'Pharo20'.
go allResolved
do: [ :each |
self crLog: each packageName.
go package: each packageName;
fetch]
Script 8.5: Fetching all the refactoring packages from the Pharo2.0 repository
| go |
go := Gofer new.
go squeaksource3: 'Pharo20'.
(go allResolved select: [ :each | 'Refactoring*' match: each packageName])
do: [ :pack |
self crLog: pack packageName.
go package: pack packageName; fetch]
143
password: 'pharoPwd').
((FileSystem workingDirectory / 'package-cache')
allEntries
select: [ :each | '*.mcz' match: each])
do: [ :f | go version: ('.' join: (f findTokens: $.) allButLast); push]
The following script uses the new filesystem library, we also show how
we can get the package name and not the versions. The script also pays
attention to only publish mcz files. It can be extended to publish selectively
specific packages.
Script 8.7: How to publish package files to a new repository using Pharo 20
| go |
go := Gofer new.
go repository: (MCHttpRepository
location: 'http://ss3.gemtalksystems.com/ss/rb-pharo'
user: 'pharoUser'
password: 'pharoPwd').
(((FileSystem disk workingDirectory / 'package-cache')
allFiles select: [:each | '*.mcz' match: each basename])
groupedBy: [:each | (each base copyUpToLast: $-) ])
keys do: [:name | go package: name; push]
144
8.6
Chapter summary
Gofer provides a robust and stable implementation to script the management of your packages. When your project grows, you should really consider using Metacello (see Chapter 9).
In this chapter, we introduce how we can script package with Gofer.
The method load allows us to load packages from sources given with
the method url: and package:.
The method url: supports FTP and local directory access.
Chapter summary
145
Chapter 9
9.1
Introduction
148
A package management system provides a consistent way to install packages. Package management systems are sometimes incorrectly referred to as
installers. This can lead to confusion, because a package management system does a lot more than install software. You may have used package management systems in other contexts: examples include Envy (in VisualAge
Smalltalk), Maven (in Java), and apt-get/aptitude (in Debian and Ubuntu).
One of the key features of a package management system is that it correctly loads any package: you should never need to manually install anything.
To make this possible, each dependency, and the dependencies of the dependencies, and so on, must be specified in the description of the package, with
enough information to allow the package management tools to load them in
the correct order.
As an example of the power of Metacello, you can take a PharoCore image, and load any package of any project without any problems with dependencies. Of course, Metacello does not do magic: this only works as long as
the package developers have properly defined the dependencies.
9.2
Pharo provides three tools for managing software packages; they are closely
related, but each has its own purpose. The tools are Monticello, which manages versions of source code, Gofer, which is a scripting interface for Monticello, and Metacello, which is a package management system.
Monticello: source code versioning. Source code versioning is the process
of assigning unique versions to particular software states. It lets you
commit a new version, update to a new version committed by someone
else, merge changes, look at the differences between versions, revert to
an older version, etc.
Pharo uses the Monticello source code versioning system, which manages Monticello packages. Monticello lets us do all of the above operations on individual packages, but Monticello does not provide a good
way to easily specify dependencies between packages, identify stable
versions of a package, or group packages into meaningful units. Chapter 7 describes it.
Gofer: Monticellos scripting interface. Gofer is a small tool that sits on top
of Monticello: it is used to load, update, merge, difference, revert, commit, recompile and unload groups of Monticello packages. Gofer also
makes sure that these operations are performed as cleanly as possible.
For more information, see Chapter 8.
Metacello: package management. Metacello introduces the notion of
Metacello features
149
9.3
Metacello features
150
Versions. A version identifies the exact version of each package and project
that should be loaded. A version is based upon a baseline version. For
each package in the baseline version, the Monticello file name (e.g.,
Metacello-Base-dkh.152) is specified. For each project in the baseline
version, the Metacello version number is specified.
ConfigurationOfProjectA, in Figure 9.1 contains two baselines (baseline
0.4 and 0.5) and four versions (version 0.4, 0.4.1, 0.5, and 0.6). Baseline 0.4 is
composed of two packages (PackageA and PackageB). Version 0.4 is based
on baseline 0.4 and specifies the version for each of the packages (PackageAversion.5 and PackageB-version.3). Version 0.4.1 is also based on baseline
0.4, but specifies a different version for PackageA (Package-version.7).
Baseline 0.5 is composed of 3 packages (PackageA, PackageB, and PackageC) and it depends on an external project (ProjectB). A new package (PackageC) and a project dependency (ProjectB) was added to the project so a
new baseline version reflecting the new structure needed to be created. Ver-
151
sion 0.5 is based on baseline 0.5 and specifies the versions of the packages
(PackageA-version.6, PackageB-version.4 and PackageC-version.1) and version of the dependent project (ProjectB-version3).
9.4
152
ConfigurationOfCoolBrowser>>version01: spec
<version: '0.1'>
spec for: #common do: [
spec blessing: #release.
spec repository: 'http://www.example.com/CoolBrowser'.
spec
package: 'CoolBrowser-Core' with: 'CoolBrowser-Core-BobJones.10';
package: 'CoolBrowser-Tests' with: 'CoolBrowser-Tests-JohnLewis.3' ]
The method version01: spec builds a description of version 0.1 of the project
in the object spec. The common code for version 0.1 (specified using the message for:do:) consists of particular versions of the packages named CoolBrowser
-Core and CoolBrowser-Tests. These are specified with the message package:
packageName with: versionName. These versions are available in the Monticello repository http://www.example.com/CoolBrowser, which is specified using
the message repository:. The blessing: method is used to denote that this is
a released version and that the specification will not be changed in the future. The blessing #development should be used when the version has not
stabilized.
Now let us look at more details.
Immediately after the method selector you see the pragma definition:
<version: '0.1'>. The pragma version: indicates that the version created in
this method should be associated with version 0.1 of the CoolBrowser
project. That is why we said that the name of the method is not that important. Metacello uses the pragma, not the method name, to identify
the version being defined.
The argument of the method, spec, is the only variable in the method
and it is used as the receiver of four different messages: for:do:, blessing:,
package:with:, and repository:.
Each time a block is passed as argument of the messages (for:do:,
package:with:. . . ) a new object is pushed on a stack and the messages
within the block are sent to the object on the top of the stack.
153
Specification objects. A spec object is an object representing all the information about a given version. A version is just a number while the specification is the object. You can access the specification using the spec message,
though normally this is not needed.
(ConfigurationOfCoolBrowser project version: '0.1') spec
This answers an object (instance of class MetacelloMCVersionSpec) that contains exactly the information of the method that defines version 0.1.
Creating a new version. Let us assume that version 0.2 of our
project consists of the package versions CoolBrowser-Core-BobJones.15 and
CoolBrowser-Tests-JohnLewis.8 and a new package CoolBrowser-Addons with
version CoolBrowser-Addons-JohnLewis.3. We specify this new configuration
by creating the following method named version02:.
ConfigurationOfCoolBrowser>>version02: spec
<version: '0.2'>
spec for: #common do: [
spec repository: 'http://www.example.com/CoolBrowser'.
spec
package: 'CoolBrowser-Core' with: 'CoolBrowser-Core-BobJones.15';
package: 'CoolBrowser-Tests' with: 'CoolBrowser-Tests-JohnLewis.8';
package: 'CoolBrowser-Addons' with: 'CoolBrowser-Addons-JohnLewis.3']
How to manage multiple repositories. You can also add multiple repositories to a spec. You just have to specify multiple times repository: expression.
ConfigurationOfCoolBrowser>>version02: spec
...
154
9.5
155
that you have a coherent set of package versions. To load versions, you send
the message load to a version. Here are some examples for loading versions
of the CoolBrowser:
(ConfigurationOfCoolBrowser project version: '0.1') load.
(ConfigurationOfCoolBrowser project version: '0.2') load.
Note that in addition, if you print the result of each expression, you get
a list of packages in load order: Metacello manages not only which packages
are loaded, but also the order. It can be handy to debug configurations.
Selective Loading. By default, the load message loads all the packages associated with the version (as we will see later, we can change that by defining
a particular group called default). If you want to load a subset of the packages
in a project, you should list the names of the packages that you are interested
in as an argument to the load: method:
(ConfigurationOfCoolBrowser project version: '0.2') load:
{ 'CoolBrowser-Core' .
'CoolBrowser-Addons' }.
Debugging Configuration. If you want to simulate the loading of a configuration, without actually loading it, you should use record (or record:) instead
of load (or load:). Then, to get the result of the simulation, you should send it
the message loadDirective as follows:
((ConfigurationOfCoolBrowser project version: '0.2') record:
{ 'CoolBrowser-Core' .
'CoolBrowser-Addons' }) loadDirective.
Apart from load and record, there is also another useful method which is
fetch (and fetch:). As explained, record simply records which Monticello files
should be downloaded and in which order. fetch accesses and downloads all
the needed Monticello files. Just for the record, in the implementation load
first does a fetch and then a doLoad.
9.6
A project is generally composed of several packages, which often have dependencies on other packages. It is also likely that a certain package depends
on a specific version of another package. Handling dependencies correctly
is really important and is one of the major benefits of Metacello. There are
two types of dependencies:
156
as described in Figure 9.4. The specifications for versions 0.1 and 0.2 did not
capture this dependency. Here is a new configuration that does:
ConfigurationOfCoolBrowser>>version03: spec
<version: '0.3'>
spec for: #common do: [
spec repository: 'http://www.example.com/CoolBrowser'.
spec
package: 'CoolBrowser-Core' with: 'CoolBrowser-Core-BobJones.15';
package: 'CoolBrowser-Tests' with: [
spec
file: 'CoolBrowser-Tests-JohnLewis.8';
requires: 'CoolBrowser-Core' ];
package: 'CoolBrowser-Addons' with: [
spec
file: 'CoolBrowser-Addons-JohnLewis.3';
requires: 'CoolBrowser-Core' ]].
Baselines
157
9.7
Baselines
"convention"
"convention"
158
Figure 9.5: Version 0.4 now imports a baseline that expresses the dependencies between packages.
it, as shown in Figure 9.5. The baseline specifies a repository, the packages,
and the dependencies between those packages, but it does not specify the
specific versions of the packages.
To define a version in terms of a baseline, we use the pragma
<version:imports:> , as follows:
ConfigurationOfCoolBrowser>>version04: spec
<version: 0.4 imports: #('0.4-baseline')>
spec for: #common do: [
spec
package: 'CoolBrowser-Core' with: 'CoolBrowser-Core-BobJones.15';
package: 'CoolBrowser-Tests' with: 'CoolBrowser-Tests-JohnLewis.8';
package: 'CoolBrowser-Addons' with: 'CoolBrowser-Addons-JohnLewis.3'
].
Baselines
159
Figure 9.6: A second version (0.5) imports the same baseline as version 0.4.
Loading Baselines
Even though version 0.4-baseline does not contain explicit package version
information, you can still load it!
(ConfigurationOfCoolBrowser project version: '0.4-baseline') load.
When the loader encounters a package without version information, it attempts to load the most recent version of the package from the repository.
Sometimes, especially when several developers are working on a project,
it may be useful to load a baseline version to access the most recent work of all
of the developers. In such a case, the baseline version is really the bleeding
edge version.
Declaring a new version. Now suppose that we want to create a new version of our project, version 0.5, that has the same structure as version 0.4, but
contains different versions of the packages. We can capture this content by
importing the same baseline; this relationship is depicted in Figure 9.6.
ConfigurationOfCoolBrowser>>version05: spec
<version: 0.5 imports: #(0.4-baseline)>
spec for: #common do: [
spec
package: 'CoolBrowser-Core' with: 'CoolBrowser-Core-BobJones.20';
package: 'CoolBrowser-Tests' with: 'CoolBrowser-Tests-JohnLewis.8';
package: 'CoolBrowser-Addons' with: 'CoolBrowser-Addons-JohnLewis.6' ].
Creating a baseline for a big project will often require some time and
effort, since it must capture all the dependencies of all the packages, as well
160
as some other things that we will look at later. However, once the baseline is
defined, creating new versions of the project is greatly simplified and takes
very little time.
9.8
Groups
instead of having to explicitly list all of the test packages, like this:
(ConfigurationOfCoolBrowser project version: '0.6')
load: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests').
Groups
161
Figure 9.7: A baseline with six groups: default, Core, Extras, Tests, CompleteWithoutTests and CompleteWithTests.
Groups are defined in baselines. We are defining the groups in the baseline version, since a group is a structural component. Note that the default
group will be used in the subsequent sections. Here the default group mentions that the two packages CoolBrowser-Core and CoolBrowser-Addons
will be loaded when the method load is used.
Using this baseline, we can now define version 0.6 to be the same as version 0.5, except for the addition of the new package CoolBrowser-AddonsTests.
ConfigurationOfCoolBrowser>>version06: spec
<version: 0.6 imports: #(0.6-baseline)>
spec for: #common do: [
spec
package: 'CoolBrowser-Core' with: 'CoolBrowser-Core-BobJones.20';
package: 'CoolBrowser-Tests' with: 'CoolBrowser-Tests-JohnLewis.8';
package: 'CoolBrowser-Addons' with: 'CoolBrowser-Addons-JohnLewis.6' ;
package: 'CoolBrowser-AddonsTests' with: 'CoolBrowser-AddonsTests-
162
Examples. Once you have defined a group, you can use its name anywhere
you would use the name of a project or package. The load: method takes as
parameter the name of a package, a project, a group, or a collection of those
items. All of the following statements are possible:
(ConfigurationOfCoolBrowser project version: '0.6') load: 'CoolBrowser-Core'.
"Load a single package"
(ConfigurationOfCoolBrowser project version: '0.6') load: 'Core'.
"Load a single group"
(ConfigurationOfCoolBrowser project version: '0.6') load: 'CompleteWithTests'.
"Load a single group"
(ConfigurationOfCoolBrowser project version: '0.6')
load: #('CoolBrowser-Core' 'Tests').
"Loads a package and a group"
(ConfigurationOfCoolBrowser project version: '0.6')
load: #('CoolBrowser-Core' 'CoolBrowser-Addons' 'Tests').
"Loads two packages and a group"
(ConfigurationOfCoolBrowser project version: '0.6')
load: #('CoolBrowser-Core' 'CoolBrowser-Tests').
"Loads two packages"
(ConfigurationOfCoolBrowser project version: '0.6') load: #('Core' 'Tests').
"Loads two groups"
The groups default and ALL. The default group is a special one. The load
message loads the members of the default group while loading the ALL group
will load all the packages. Moreover, by default, default loads ALL!
(ConfigurationOfCoolBrowser project version: '0.6') load.
163
We believe that by using default it is good to load the tests as well. This
is why, either we explicitly put the Tests group in the default group, or we do
not specify a default at all.
9.9
In the same way that a package can depend on other packages, a project can
depend on other projects. For example, Pier, a content management system
(CMS), depends on Magritte and Seaside. A project can depend on the entirety of one or more other projects, on a group of packages from another
project, or on just one or two packages from another project.
This works, up to a point. The shortcoming of this approach is that because project B is not described by a Metacello configuration, the dependencies of B are not managed. That is, any dependencies of package B will not
be loaded. Our recommendation is that in this case, you take the time to
create a configuration for project B.
164
We have named the project reference CoolBrowser ALL. The name of the
project reference is arbitrary. You can select the name you want, although
is it recommended that you choose a name that makes sense to that project
reference. In the specification for the CoolToolSet-Core package, we have specified that CoolBrowser ALL is required. As will be explained later, the message
project:with: allows one to specify the exact version of the project you want to
load.
The message loads: specify which packages or groups to load. The parameter of loads: can be the same as the one of load, i.e., the name of a package,
the name of a group, or a collection of these things. Notice that calling loads:
165
is optional, you only need it if you want to load something different from
the default.
Now we can load CoolToolSet like this:
(ConfigurationOfCoolToolSet project version: '0.1') load.
The message className: specifies the name of the class that contains the
project metadata; in this case ConfigurationOfCoolBrowser.
The messages file: and repository: give Metacello the information that
it might need to search for and load class ConfigurationOfCoolBrowser, if
it is not present in the image. The argument of file: is the name of the
Monticello package that contains the metadata class, and the argument
of repository: is the URL of the Monticello repository that contains that
package. If the Monticello repository is protected, then you should use
the message: repository:username:password: instead.
Now we can load CoolToolSet like this:
(ConfigurationOfCoolToolSet project version: '0.1') load.
166
CoolBrowser default loads the default group and the reference named Cool-
167
As we did for internal dependencies, baseline 0.2-baseline (and also in 0.1baseline) does not specify the version of the project the configuration depends
on. Instead, we do this in the version method using the message project:with:.
ConfigurationOfCoolToolSet>>version02: spec
<version: '0.2' imports: #('0.2-baseline' )>
spec for: #common do: [
spec blessing: #beta.
spec
package: 'CoolToolSet-Core' with: 'CoolToolSet-Core-AlanJay.1';
package: 'CoolToolSet-Tests' with: 'CoolToolSet-Tests-AlanJay.1';
project: 'CoolBrowser default' with: '1.3';
project: 'CoolBrowser Tests' with: '1.3'].
What you can also do is to use the loads: message in the project reference
to specify which packages of the project you want to load. Such solution is
nice because you factor the information in the project reference and you do
not have to duplicate it in all the versions.
168
ConfigurationOfSoup>>version10: spec
<version: '1.0' imports: #('1.0-baseline')>
spec for: #pharo do: [
spec project: 'XMLSupport' with: [
spec
versionString: #stable;
loads: #('XML-Parser' 'XML-Tests-Parser');
repository: 'http://ss3.gemstone.com/ss/xmlsupport' ].
spec
package: 'Soup-Core' with: 'Soup-Core-sd.11';
package: 'Soup-Tests-Core' with: 'Soup-Tests-Core-sd.3';
package: 'Soup-Help' with: 'Soup-Help-StephaneDucasse.2' ].
169
<version: '0.2-baseline'>
spec for: #common do: [
spec blessing: #baseline.
spec repository: 'http://www.example.com/CoolToolSet'.
spec
project: 'CoolBrowser default' with: [
spec
loads: #('default');
repository: 'http://www.example.com/CoolBrowser';
file: 'CoolBrowser-Metacello']
project: 'CoolBrowser Tests'
copyFrom: 'CoolBrowser default'
with: [ spec loads: #('Tests').].
spec
package: 'CoolToolSet-Core' with: [ spec requires: 'CoolBrowser default' ];
package: 'CoolToolSet-Tests' with: [
spec requires: #('CoolToolSet-Core' 'CoolBrowser Tests') ].].
9.10
We want to discuss the difference between depending on a package, depending on a project and the different ways to express it. Imagine the following
baseline1.1 from Fame.
baseline11: spec
<version: '1.1-baseline'>
spec for: #'common' do: [
spec blessing: #'baseline'.
spec description: 'Baseline 1.1 first version on SmalltalkHub, copied from baseline
1.0 on SqueakSource'.
spec repository: 'http://www.smalltalkhub.com/mc/Moose/Fame/main'.
spec
package: 'Fame-Core';
package: 'Fame-Util';
package: 'Fame-ImportExport' with: [spec requires: #('Fame-Core' ) ];
package: 'Fame-SmalltalkBinding' with: [spec requires: #('Fame-Core' ) ];
package: 'Fame-Example';
package: 'Phexample' with: [spec repository: 'http://smalltalkhub.com/mc/
PharoExtras/Phexample/main' ];
package: 'Fame-Tests-Core' with: [spec requires: #('Fame-Core' 'FameExample' 'Phexample' ) ].
spec
group: 'Core' with: #('Fame-Core' 'Fame-ImportExport' 'Fame-Util' 'FameSmalltalkBinding' );
group: 'Tests' with: #('Fame-Tests-Core' ) ].
170
171
project: 'PhexampleCore'
with: [ spec
versionString: #stable;
loads: #('Core');
repository: 'http://www.smalltalkhub.com/mc/Phexample/main' ].
....
'Fame-Tests-Core' with: [spec requires: #('Fame-Core' 'Fame-Example' '
PhexampleCore' ) ].
9.11
Occasionally, you may find that you need to execute some code either before
or after a package or project is loaded. For example, if you are installing a
System Browser it would be a good idea to register it as default after it is
loaded. Or maybe you want to open some workspaces after the installation.
Metacello provides this feature by means of the messages preLoadDoIt: and
postLoadDoIt:. The arguments to these messages are selectors of methods defined on the configuration class as shown below. For the moment, these preand post-scripts can be defined for a single package or for an entire project.
Continuing with our example:
ConfigurationOfCoolBrowser>>version08: spec
<version: '0.8' imports: #('0.7-baseline')>
spec for: #common do: [
spec
package: 'CoolBrowser-Core' with: [
spec
file: 'CoolBrowser-Core-BobJones.20';
preLoadDoIt: #preloadForCore;
postLoadDoIt: #postloadForCore:package: ];
....
package: 'CoolBrowser-AddonsTests' with: 'CoolBrowser-AddonsTestsJohnLewis.1' ].
ConfigurationOfCoolBrowser>>preloadForCore
Transcript show: 'This is the preload script. Sorry I had no better idea'.
ConfigurationOfCoolBrowser>>postloadForCore: loader package: packageSpec
Transcript cr;
show: '#postloadForCore executed, Loader: ', loader printString,
' spec: ', packageSpec printString.
Smalltalk at: #SystemBrowser ifPresent: [:cl | cl default: (Smalltalk classNamed:
#CoolBrowser)].
172
In this example, we added pre and post load scripts at project level.
Again, the selectors can receive 0, 1 or 2 arguments.
9.12
173
Metacello automatically loads the package of the used platform. But to do that,
we need to specify platform specific information using the method for:do: as
shown in the following example. Here we define that a different package version will be loaded depending on the platform. The platform specific packages will be loaded in addition to the common ones depending on which
plateform you are executing the script.
ConfigurationOfCoolBrowser>>version09: spec
<version: '0.9' imports: #('0.9-baseline')>
spec for: #common do: [
...
spec
...
package: 'CoolBrowser-AddonsTests' with: 'CoolBrowser-AddonsTestsJohnLewis.1' ].
spec for: #gemstone do: [
spec package: 'CoolBrowser-Platform' with: 'CoolBrowser-PlatformGemstoneBobJones.4'.].
spec for: #pharo do: [
spec package: 'CoolBrowser-Platform' with: 'CoolBrowser-PlatformPharoJohnLewis.7'.].
Specifying versions is one aspect though you should also specify baseline
specific information.
ConfigurationOfCoolBrowser>>baseline09: spec
<version: '0.9-baseline'>
spec for: #common do: [
spec blessing: #baseline.
spec repository: 'http://www.example.com/CoolBrowser'.
spec
package: 'CoolBrowser-Core';
package: 'CoolBrowser-Tests' with: [ spec requires: 'CoolBrowser-Core' ];
package: 'CoolBrowser-Addons' with: [ spec requires: 'CoolBrowser-Core' ];
package: 'CoolBrowser-AddonsTests' with: [
spec requires: #('CoolBrowser-Addons' 'CoolBrowser-Tests' ) ].
spec
group: 'default' with: #('CoolBrowser-Core' 'CoolBrowser-Addons' );
group: 'Core' with: #('CoolBrowser-Core' 'CoolBrowser-Platform' );
group: 'Extras' with: #('CoolBrowser-Addon');
group: 'Tests' with: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests' );
group: 'CompleteWithoutTests' with: #('Core', 'Extras' );
group: 'CompleteWithTests' with: #('CompleteWithoutTests', 'Tests' )].
spec for: #gemstone do: [
174
Loading order. Notice that if you are in a system where the platform attributes are (#common #squeakCommon #pharo #'pharo2.x' #'pharo2.0.x') (you can
obtain this information doing ConfigurationOf project attributes) and you have
specified three sections such as #common, #pharo and #pharo2.0.x, these sections will loaded one after the other.
ConfigurationOfCoolBrowser>>baseline09: spec
<version: '0.9-baseline'>
spec for: #common do: [
175
Finally, note that the method for:do: is not only used to specify a platform
specific package, but also for anything that has to do with different dialects.
You can put whatever you want from the configuration inside that block. For
example, you can define, change and customize groups, packages, repositories, etc, for each dialect dialect and do this:
ConfigurationOfCoolBrowser>>baseline010: spec
<version: '0.10-baseline'>
spec for: #common do: [
spec blessing: #baseline.].
spec for: #pharo do: [
spec repository: 'http://www.pharo.com/CoolBrowser'.
spec
...
spec
group: 'default' with: #('CoolBrowser-Core' 'CoolBrowser-Addons' );
group: 'Core' with: #('CoolBrowser-Core' 'CoolBrowser-Platform' );
group: 'Extras' with: #('CoolBrowser-Addon');
group: 'Tests' with: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests' );
group: 'CompleteWithoutTests' with: #('Core', 'Extras' );
group: 'CompleteWithTests' with: #('CompleteWithoutTests', 'Tests' )].
176
In this example, for Pharo we use a different repository than for Gemstone. However, this is not mandatory, since both can have the same repository and differ in other things, like versions, post and pre code executions,
dependencies, etc.
In addition, the addons and tests are not available for Gemstone, and
thus, those packages and groups are not included. As you can see, all that
we have been doing inside the for: #common: do: can be done inside another
for:do: for a specific dialect.
9.13
Milestoning development:
sions
symbolic ver-
177
Note that the #stable here overrides the bleeding edge loading behavior
that you would get if you were (fool enough) to load a baseline (remember
loading a baseline loads bleeding edge versions). Here we make sure that
the stable version of OmniBrowser for your platform will be loaded (and
not the latest one). The next section is about the different symbolic versions.
178
Or to use the special symbolic version notDefined: as in the following definition of the symbolic version development:
development: spec
<symbolicVersion: #development>
spec for: #common version: #notDefined.
spec for: #'pharo1.1.x' version: '1.6'.
spec for: #'pharo1.2.x' version: '1.6'.
Here it indicates that there is no version for the common tag. Using a symbolic version that resolves to notDefined will result in a
MetacelloSymbolicVersionNotDefinedError being signaled.
For the development symbolic version you can use any version that you
wouldd like (including another symbolic version). As the following code
shows it, we can specify a specific version, a baseline (which will load the
latest versions specified by the baseline) or a stable version.
development: spec
<symbolicVersion: #'development'>
spec for: #'common' version: '1.1'
development: spec
179
<symbolicVersion: #'development'>
spec for: #'common' version: '1.1-baseline'
development: spec
<symbolicVersion: #'development'>
spec for: #'common' version: #stable
Warning. The term stable is misleading. It does not mean that you will
always load exactly the same version because the developer of the system
you rely on may change the meaning of stable to point to another stable
version. But such a stable version may introduce incompatibility with your
own code. So when you release your code you should use a specific version
to be sure that you will not get impacted by other changes.
is not always the last version. This is because latestVersion answers the latest
version whose blessing is not #development, #broken, or #blessing. To find the
latest #development version for example, you should execute this expression:
ConfigurationOfCoolBrowser project latestVersion: #development.
Nevertheless, you can get the very last version independently of blessing
using the lastVersion method as illustrated below
ConfigurationOfCoolBrowser project lastVersion.
In general, the #development blessing should be used for any version that
is unstable. Once a version has stabilized, a different blessing should be
applied.
The following expression will load the latest version of all of the packages
for the latest #baseline version:
(ConfigurationOfCoolBrowser project latestVersion: #baseline) load.
Since the latest #baseline version should reflect the most up-to-date project
structure, executing the previous expression loads the absolute bleeding
edge version of the project.
180
Hints.
Some patterns emerge when working with Metacello. Here is one: Create
a baseline version and use the #stable version for all of the projects in the
baseline. In the literal version, use the explicit version, so that you get an
explicit repeatable specification for a set of projects that were known to work
together.
Here is an example, the pharo 1.2.2-baseline would include specs that
look like this:
spec
project: 'OB Dev' with: [
spec
className: 'ConfigurationOfOmniBrowser';
versionString: #stable;
...];
project: 'ScriptManager' with: [
spec
className: 'ConfigurationOfScriptManager';
versionString: #stable;
...];
project: 'Shout' with: [
spec
className: 'ConfigurationOfShout';
versionString: #stable;
...];
....].
Loading Pharo 1.2.2-baseline would cause the #stable version for each of
those projects to be loaded ... but remember over time the #stable version
will change and incompatibilities between packages can creep in. By using
#stable versions you will be in better shape than using #bleedingEdge, because
the #stable version is known to work.
Pharo 1.2.2 (literal version) will have corresponding specs that look like
this:
spec
project: 'OB Dev' with: '1.2.4';
project: 'ScriptManager' with: '1.2';
project: 'Shout' with: '1.2.2';
....].
So that you have driven a stake into the ground stating that these versions
are known to work together (have passed tests as a unit). Five years in the
future, you will be able to load Pharo 1.2.2 and get exactly the same packages
every time, whereas the #stable versions may have drifted over time.
If you are just bringing up a PharoCore1.2 image and would like to load
Load types
181
the Pharo dev code, you should load the #stable version of Pharo (which may
be 1.2.2 today and 1.2.3 tomorrow). If you want to duplicate the environment
that someone is working in, you will ask them for the version of Pharo and
load that explicit version to reproduce the bug or whatever request you may
need.
If you use the stable version in your baseline there is no need to do anything special in your version specification.
9.14
Load types
Metacello lets you specify the way packages are loaded through its load
types". At the time of this writing, there are only two possible load types:
atomic and linear.
Atomic loading is used where packages have been partitioned in such
a way that they can not be loaded individually. The definitions from each
package are merged together into one giant load by the Monticello package loader. Class side initialize methods and pre/post code execution are performed for the whole set of packages, not individually.
182
If you use a linear load, then each package is loaded in order. Class side
initialize methods and pre/post code execution are performed just before or
Conditional loading
9.15
183
Conditional loading
When loading a project, usually the user wants to decide whether to load
or not certain packages depending on a specific condition, for example, the
existence of certain other packages in the image. Suppose you want to load
Seaside in your image. Seaside has a tool that depends on OmniBrowser and
it is used for managing instances of web servers. What can be done with this
little tool can also be done by code. If you want to load such tool you need
OmniBrowser. However, other users may not need such package. An alternative could be to provide different groups, one that includes such package
and one that does not. The problem is that the final user should be aware
of this and load different groups in different situations. With conditional
loading you can, for example, load that Seaside tool only if OmniBrowser is
present in the image. This will be done automatically by Metacello and there
is no need to explicitly load a particular group.
Suppose that our CoolToolSet starts to provide many more features. We
first split the core in two packages: CoolToolSet-Core and CoolToolSet-CB.
CoolBrowser can be present in one image, but not in another one. We want
to load the package CoolToolSet-CB by default only and if CoolBrowser is
present.
The mentioned conditionals are achieved in Metacello by using the project
attributes we saw in the previous section. They are defined in the project
method. Example:
ConfigurationOfCoolBrowser >>project
| |
^ project ifNil: [ | constructor |
"Bootstrap Metacello if it is not already loaded"
self class ensureMetacello.
"Construct Metacello project"
constructor := (Smalltalk at: #MetacelloVersionConstructor) on: self.
project := constructor project.
projectAttributes := ((Smalltalk at: #CBNode ifAbsent: []) == nil
ifTrue: [ #( #CBNotPresent ) ]
ifFalse: [ #( #CBPresent ) ]).
project projectAttributes: projectAttributes.
project loadType: #linear.
project ]
As you can see in the code, we check if CBNode class (a class from CoolBrowser) is present and depending on that we set a specific project attribute.
This is flexible enough to let you define your own conditions and set the
amount of project attributes you wish (you can define an array of attributes).
Now the question is how to use these project attributes. In the following
baseline we see an example:
184
You can notice that the way to use project attributes is through the existing method for:do:. Inside that method you can do whatever you want: define groups, dependencies, etc. In our case, if CoolBrowser is present, then
we just add CoolToolSet-CB to the default group. If it is not present, then
CoolBrowser default is added to dependency to CoolToolSet-CB. In this
case, we do not add it to the default group because we do not want that. If
desired, the user should explicitly load that package also.
Again, notice that inside the for:do: you are free to do whatever you want.
9.16
185
186
9.17
Chapter summary
Metacello Memento
ConfigurationOfCoolToolSet>>baseline06: spec
"could be called differently just a convention"
<version: '0.6-baseline'>
"Convention. Used in the version: method"
spec for: #common do: [
"#common/#pharo/#gemstone/#pharo1.4"
spec blessing: #baseline.
"Important: identifies a baseline"
spec repository: 'http://www.example.com/CoolToolSet'.
"When we depend on other projects"
spec project: 'CoolBrowser default' with: [
spec
className: 'ConfigurationOfCoolBrowser';
"Optional if convention followed"
versionString: #bleedingEdge; "Optional. Could be #stable/#bleedingEdge/specific version"
loads: #('default');
"which packages or groups to load"
file: 'CoolBrowser-Metacello';
"Optional when same as class name"
repository: 'http://www.example.com/CoolBrowser' ];
project: 'CoolBrowser Tests'
copyFrom: 'CoolBrowser default'
"Just to reuse information"
with: [ spec loads: #('Tests').].
"Just to reuse information"
"Our internal package dependencies"
spec
package: 'CoolToolSet-Core';
package: 'CoolToolSet-Tests' with: [ spec requires: #('CoolToolSet-Core') ];
package: 'CoolBrowser-Addons' with: [ spec requires: 'CoolBrowser-Core' ] ;
package: 'CoolBrowser-AddonsTests' with: [
spec requires: #('CoolBrowser-Addons' 'CoolBrowser-Tests' ) ].
spec
group: default with: #(CoolBrowser-Core CoolBrowser-Addons);
group: 'Core' with: #('CoolBrowser-Core');
group: 'Extras' with: #('CoolBrowser-Addon');
group: 'Tests' with: #('CoolBrowser-Tests' 'CoolBrowser-AddonsTests');
group: 'CompleteWithoutTests' with: #('Core' 'Extras');
group: 'CompleteWithTests' with: #('CompleteWithoutTests' 'Tests')
].
ConfigurationOfCoolBrowser>>version07: spec
"could be called differently just a convention"
<version: '0.7' imports: #('0.6-baseline')>
"Convention. No baseline so this is version"
"do not import baseline from other baselines"
spec for: #common do: [
"#common/#pharo/#gemstone/#pharo1.4"
spec blessing: #release.
"Required #development/#release: release means that it will not change
anymore"
spec description: 'In this release .....'.
spec author: 'JohnLewis'.
spec timestamp: '10/12/2009 09:26'.
spec
package: 'CoolBrowser-Core' with: 'CoolBrowser-Core-BobJones.20';
package: 'CoolBrowser-Tests' with: 'CoolBrowser-Tests-JohnLewis.8';
package: 'CoolBrowser-Addons' with: 'CoolBrowser-Addons-JohnLewis.6' ;
package: 'CoolBrowser-AddonsTests' with: 'CoolBrowser-AddonsTests-JohnLewis.1']
Chapter summary
187
ConfigurationOfGemToolsExample>>development: spec
"note that the selector can be anything"
<symbolicVersion: #development>
"#stable/#development/#bleedingEdge"
spec for: #common version: '1.0'.
"1.0 is the version of your development version"
"#common or your platform attributes: #gemstone, #pharo, or #pharo1.4"
ConfigurationOfGemToolsExample>>baseline10: spec
<version: '1.0-baseline'>
spec for: #common do: [
spec blessing: #'baseline'.
"required see above"
spec repository: 'http://seaside.gemstone.com/ss/GLASSClient'.
spec
project: 'FFI' with: [
spec
className: 'ConfigurationOfFFI';
versionString: #bleedingEdge;
"Optional. #stable/#development/#bleedingEdge/specific
version"
repository: 'http://www.squeaksource.com/MetacelloRepository' ];
project: 'OmniBrowser' with: [
spec
className: 'ConfigurationOfOmniBrowser';
versionString: #stable;
"Optional. #stable/#development/#bleedingEdge/specific
version"
repository: 'http://www.squeaksource.com/MetacelloRepository' ];
project: 'Shout' with: [
spec
className: 'ConfigurationOfShout';
versionString: #stable;
repository: 'http://www.squeaksource.com/MetacelloRepository' ];
project: 'HelpSystem' with: [
spec
className: 'ConfigurationOfHelpSystem';
versionString: #stable;
repository: 'http://www.squeaksource.com/MetacelloRepository'].
spec
package: 'OB-SUnitGUI' with: [spec requires: #('OmniBrowser')];
package: 'GemTools-Client' with: [ spec requires: #('OmniBrowser' 'FFI' 'Shout' 'OB-SUnitGUI' ).];
package: 'GemTools-Platform' with: [ spec requires: #('GemTools-Client' ). ];
package: 'GemTools-Help' with: [
spec requires: #('HelpSystem' 'GemTools-Client' ). ].
spec group: 'default' with: #('OB-SUnitGUI' 'GemTools-Client' 'GemTools-Platform' 'GemTools-Help')].
ConfigurationOfGemToolsExample>>version10: spec
<version: '1.0' imports: #('1.0-baseline' )>
spec for: #common do: [
spec blessing: #development.
spec description: 'initial development version'.
spec author: 'dkh'.
spec timestamp: '1/12/2011 12:29'.
spec
project: 'FFI' with: '1.2';
project: 'OmniBrowser' with: #stable;
project: 'Shout' with: #stable;
project: 'HelpSystem' with: #stable.
spec
package: 'OB-SUnitGUI' with: 'OB-SUnitGUI-dkh.52';
package: 'GemTools-Client' with: 'GemTools-Client-NorbertHartl.544';
package: 'GemTools-Platform' with: 'GemTools-Platform.pharo10beta-dkh.5';
package: 'GemTools-Help' with: 'GemTools-Help-DaleHenrichs.24'. ].
188
Loading. load, load: The load method loads the default group and if there
is no default group defined, then all packages are loaded. The load: method
takes as parameter the name of a package, a project, a group, or a collection
of those items.
(ConfigurationOfCoolBrowser project version: '0.1') load.
(ConfigurationOfCoolBrowser project version: '0.2') load: {'CBrowser-Core' . 'CBrowserAddons'}.
Debugging. record, record: loadDirectives The message record does the record
for the default group and if you want a specific group of items, you can use
record:, just as it is for load.
((ConfigurationOfCoolBrowser project version: '0.2') record:
{ 'CoolBrowser-Core' .
'CoolBrowser-Addons' }) loadDirective.
use
Version development
"Since development continues we create a new version"
...
"Tagged as development. It will be tagged as release and so on"
Baseline
"When architecture or structure changes, a new baseline will
appear"
Version development
"and the story will continue"
Version release
Part III
Frameworks
Chapter 10
Glamour
with the participation of:
Tudor Girba (tudor@tudorgirba.com)
Browsers are a crucial instrument in understanding complex systems or
models. A browser is a tool to navigate and interact with a particular domain. Each problem domain is accompanied by an abundance of browsers
that are created to help analyze and interpret the underlying elements. The
issue with these browsers is that they are frequently (re)written from scratch,
making them expensive to create and burdensome to maintain. While many
frameworks exist to ease the development of user interfaces in general, they
provide only limited support to simplifying the creation of browsers.
Glamour is a dedicated framework to describe the navigation flow of
browsers. Thanks to its declarative language, Glamour allows one to quickly
define new browsers for their data.
In this chapter we will first detail the creation of some example browsers
to have an overview of the Glamour framework. In a second part, we will
jump into details.
10.1
192
Glamour
Now that Glamour is installed, we are ready to build our first browser
by using Glamours declarative language. What about building an Apples
Finder-like file browser? This browser is built using the Miller Columns
browsing technique, displaying hierarchical elements in a series of columns.
The principle of this browser is that a column always reflects the content of
the element selected in the previous column, the first column-content being
chosen on opening.
In our case of navigating through the file systems, the browser displays a
list of a particular directorys entries (each file and directory) in the first column and then, depending on the user selection, appending another column
(see Figure 10.1):
if the user selects a directory, the next column will display the entries
of that particular directory;
if the user selects a file, the next column will display the content of the
file.
This may look complex at first because of the recursion. However, Glamour provides an intuitive way of describing Miller Columns-based browsers.
According to the Glamours terminology this particular browser is called
finder, referring to the Apples Finder found on Mac OS X. Glamour offers
this behavior with the class GLMFinder. This class has to be instantiated and
initialized to properly list our domain of interest, the files:
| browser |
browser := GLMFinder new.
browser show: [:a |
a list
193
display: #children ].
browser openOn: FileSystem disk root.
Note that at that stage selecting a plain file raises an error. We will understand why and how to fix that situation soon.
From this small piece of code you get a list of all entries (either files or
directories) found at the root of your file system, each line representing either
a file or a directory. If you click on a directory, you can see the entries of
this directory in the next column. The filesystem navigation facilities are
provided by the Filesystem framework, thoroughly discussed in Chapter 3.
This code has some problems however. Each line displays the full print
string of the entry and this is probably not what you want. A typical user
would expect only names of each entry. This can easily be done by customizing the list:
browser show: [:a |
a list
display: #children;
format: #basename ].
This way, the message basename will be sent to each entry to get its name.
This makes the files and directores much easier to read by showing the file
name instead of its fullname.
Another problem is that the code does not distinguish between files and
directories. If you click on a file, you will get an error because the browser
will send it the message children that it does not understand. To fix that, we
just have to avoid displaying a list of contained entries if the selected element
is a file:
browser show: [:a |
a list
when: #isDirectory;
display: #children;
format: #basename ].
This works well but the user can not distinguish between a line representing a file or a directory. This can be fixed by, for example, adding a slash at
the end of the file name if it is a directory:
browser show: [:a |
a list
when: #isDirectory;
display: #children;
format: #basenameWithIndicator ].
The last thing we might want to do is to display the contents of the entry
if it is a file. The following gives the final version of the file browser:
194
Glamour
| browser |
browser := GLMFinder new
variableSizePanes;
title: 'Find your file';
yourself.
browser show: [:a |
a list
when: #isDirectory;
display: [:each | [each children ]
on: Exception
do: [Array new]];
format: #basenameWithIndicator.
a text
when: #isFile;
display: [:entry | [entry readStream contents]
on: Exception
do:['Can''t display the content of this file'] ] ].
browser openOn: FileSystem disk root.
This code extends the previous one with variable-sized panes, a title as
well as directory entry, access permission handling and file content reading.
The resulting browser is presented in Figure 10.1.
This short introduction has just presented how to install Glamour and
how to use it to create a simple file browser.
10.2
This section gives a realistic example and details the Glamour framework.
Running example
In the following tutorial we will be creating a simple Smalltalk class navigator. Such navigators are used in many Smalltalk browsers and usually
consist of four panes, which are abstractly depicted in figure Figure 10.2.
The class navigator functions as follows: Pane 1 shows a list or a tree of
packages, each package containing classes, which make up the organizational
structure of the environment. When a package is selected, pane 2 shows a
list of all classes in the selected package. When a class is selected, pane 3
shows all protocols (a construct to group methods also known as method categories) and all methods of the class are shown on pane 4. When a protocol
is selected in pane 3, only the subset of methods that belong to that protocol
195
196
Glamour
Glamour browsers are composed in terms of panes and the flow of data
between them. In our browser we currently have only one pane displaying
packages. The flow of data is specified by means of transmissions. These
are triggered when certain changes in the browser graphical user interface
occur, such as an item selection in a list. We make our browser more useful
by displaying classes contained in the selected package (see Figure 10.3).
PBE2CodeNavigator>>buildBrowser
browser := GLMTabulator new.
browser
column: #packages;
column: #classes.
browser transmit to: #packages; andShow: [:a | self packagesIn: a].
browser transmit from: #packages; to: #classes; andShow: [:a | self classesIn: a].
PBE2CodeNavigator>>classesIn: constructor
constructor list
display: [:packageName | (self organizer packageNamed: packageName)
definedClasses]
The listing above shows almost all of the core language constructs of
Glamour. Since we want to be able to reference the panes later, we give them
the distinct names packages and classes and arrange them in columns
using the column: keyword. Similarly, a row: keyword exists with which panes
can be organized in rows.
The transmit:, to: and from: keywords create a transmissiona directed connection that defines the flow of information from one pane to another. In
this case, we create a link from the packages pane to the classes pane. The
from: keyword signifies the origin of the transmission and to: the destination.
If nothing more specific is stated, Glamour assumes that the origin refers to
197
Figure 10.3: Two-pane browser. When a package is selected in the left pane,
the contained classes are shown on the right pane.
the selection of the specified pane. We show how to specify other aspects of
the origin pane and how to use multiple origins below.
Finally, the andShow: specifies what to display on the destination pane
when the connection is activated or transmitted. In our example, we want to
show a list of the classes that are contained in the selected package.
The display: keyword simply stores the supplied block within the presentation. The blocks will only be evaluated later, when the presentation should
be displayed on-screen. If no explicit display block is specified, Glamour attempts to display the object in some generic way. In the case of list presentations, this means that the displayString message is sent to the object to retrieve
a standard string representation. As we have previously seen, format: is used
to change this default behavior.
Along with display:, it is possible to specify a when: condition to limit the
applicability of the connection. By default, the only condition is that an item
is in fact selected, i.e., that the display variable argument is not null.
Another Presentation
So far, packages are visually represented as a flat list. However, packages
are naturally structured with the corresponding class category. To exploit
this structure, we replace the list with a tree presentation for packages:
PBE2CodeNavigator>>packagesIn: constructor
constructor tree
display: [ :organizer | (self rootPackagesOn: organizer) asSet sorted ];
children: [ :rootPackage :organizer | (self childrenOf: rootPackage on: organizer)
sorted ];
format: #asString
PBE2CodeNavigator>>classesIn: constructor
constructor list
198
Glamour
The browser resulting from the above changes is shown in figure Figure 10.4.
Multiple Origins
Adding the list of methods as Pane 4 involves slightly more machinery.
When a method category is selected we want to show only the methods that
belong to that category. If no category is selected, all methods that belong to
the current class are shown.
This leads to our methods pane depending on the selection of two other
panes, the class pane and the category pane. Multiple origins can be defined
using multiple from: keywords as shown below.
199
Figure 10.4: Improved class navigator including a tree to display the packages and a list of method categories for the selected class.
PBE2CodeNavigator>>buildBrowser
browser := GLMTabulator new.
browser
column: #packages;
column: #classes;
column: #categories;
column: #methods.
browser transmit to: #packages; andShow: [:a | self packagesIn: a].
browser transmit from: #packages; to: #classes; andShow: [:a | self classesIn: a].
browser transmit from: #classes; to: #categories; andShow: [:a | self categoriesIn: a].
browser transmit from: #classes; from: #categories; to: #methods;
andShow: [:a | self methodsIn: a].
PBE2CodeNavigator>>methodsIn: constructor
constructor list
display: [:class :category |
(class organization listAtCategoryNamed: category) sorted].
constructor list
when: [:class :category | class notNil and: [category isNil]];
display: [:class | class selectors sorted];
allowNil
The listing shows a couple of new properties. First, the multiple origins are reflected in the number of arguments of the blocks that are used
in the display: and when: clauses. Secondly, we are using more than one
presentationGlamour shows all presentations whose conditions match in
the order that they were defined when the corresponding transmission is
fired.
In the first presentation, the condition matches when all arguments are
200
Glamour
defined (not null), this is the default for all presentations. The second condition matches only when the category is undefined and the class defined.
When a presentation must be displayed even in the presence of an undefined origin, it is necessary to use allowNil as shown. We can therefore omit
the category from the display block.
The completed class navigator is displayed in Figure 10.5.
Ports
When we stated that transmissions connect panes this was not entirely correct. More precisely, transmissions are connected to properties of panes
called ports. Such ports consist of a name and a value which accommodates
a particular aspect of state of the pane or its contained presentations. If the
port is not explicitly specified by the user, Glamour uses the selection port by
default. As a result, the following two statements are equivalent:
browser transmit from: #packages; to: #classes; andShow: [:a | ...].
browser transmit from: #packages port: #selection; to: #classes; andShow: [:a | ...].
10.3
Reusing Browsers
One of Glamours strengths is to use browsers in place of primitive presentations such as lists and trees. This conveys formidable possibilities to compose and nest browsers.
201
We can then reuse the navigator in the new editor browser as shown
below.
Object subclass: #PBE2CodeEditor
instanceVariableNames: 'browser'
classVariableNames: ''
poolDictionaries: ''
category: 'PBE2-CodeBrowser'.
PBE2CodeEditor class>>open
202
Glamour
The listing shows how the browser is used exactly like we would use a
list or other type of presentation. In fact, browsers are a type of presentation.
Evaluating PBE2CodeEditor open opens a browser that embeds the navigator in the upper part and has an empty pane at the lower part. Source
code is not displayed yet because no connection has been made between
the panes so far. The source code is obtained by wiring the navigator with
the text pane: we need both the name of the selected method as well as the
class in which it is defined. Since this information is defined only within
the navigator browser, we must first export it to the outside world by using
sendToOutside:from:. For this we append the following lines to codeNavigator:
PBE2CodeNavigator>>buildBrowser
...
browser transmit from: #classes; toOutsidePort: #selectedClass.
browser transmit from: #methods; toOutsidePort: #selectedMethod.
^ browser
This will send the selection within classes and methods to the selectedClass and selectedMethod ports of the containing pane. Alternatively, we
could have added these lines to the navigatorIn: method in the code editorit
makes no difference to Glamour as follows:
PBE2CodeEditor>>navigatorIn: constructor
"Alternative way of adding outside ports. There is no need to use this
code and the previous one simultaneously."
| navigator |
203
We can now view the source code of any selected method and have created a modular browser by reusing the class navigator that we had already
written earlier. The composed browser described by the listing is shown in
figure 10.7.
Actions
Navigating through the domain is essential to finding useful elements. However, having a proper set of available actions is essential to letting one interact with the domain. Actions may be defined and associated with a presentation. An action is a block that is evaluated when a keyboard shortcut is
pressed or when an entry in a context menu is clicked. An action is defined
via act:on: sent to a presentation:
PBE2CodeEditor>>sourceIn: constructor
constructor text
display: [:class :method | class sourceCodeAt: method ];
act: [:presentation :class :method | class compile: presentation text] on: $s.
204
Glamour
Figure 10.7: Composed browser that reuses the previously described class
navigator to show the source of a selected method.
205
Multiple Presentations
Frequently, developers wish to provide more than one presentation of a specific object. In our code browser for example, we may wish to show the
classes not only as a list but as a graphical representation as well. Glamour
includes support to display and interact with visualizations created using
the Mondrian visualization engine (presented in Chapter 12). To add a second
presentation, we simply define it in the using: block as well:
PBE2CodeNavigator>>classesIn: constructor
constructor list
when: [:packageName | self organizer includesPackageNamed: packageName ];
display: [:packageName | (self organizer packageNamed: packageName)
definedClasses];
title: 'Class list'.
constructor mondrian
when: [:packageName | self organizer includesPackageNamed: packageName];
painting: [ :view :packageName |
view nodes: (self organizer packageNamed: packageName)
definedClasses.
view edgesFrom: #superclass.
view treeLayout];
title: 'Hierarchy'
206
Glamour
Other Browsers
We have essentially used the GLMTabulator which is named after its ability
to generate custom layouts using the aforementioned row: and column: keywords. Additional browsers are provided or can be written by the user.
Browser implementations can be subdivided into two categories: browsers
that have explicit panes, i.e.,, they are declared explicitly by the userand
browsers that have implicit panes.
The GLMTabulator is an example of a browser that uses explicit panes. With
implicit browsers, we do not declare the panes directly but the browser creates them and the connections between them internally. An example of such
a browser is the Finder, which has been discussed in Section 10.1. Since the
panes are created for us, we need not use the from:to: keywords but can simply specify our presentations:
browser := GLMFinder new.
browser list
display: [:class | class subclasses].
browser openOn: Collection
The listing above creates a browser (shown in figure 10.9) and opens to
show a list of subclasses of Collection. Upon selecting an item from the list,
the browser expands to the right to show the subclasses of the selected item.
This can continue indefinitely as long as something to select remains.
Chapter summary
10.4
207
Chapter summary
1 http://www.themoosebook.org/book
Chapter 11
11.1
210
Roassal is known to work with the versions 1.4, 2.0, 3.0, and 4.0 of Pharo.
A first visualization.
The first visualization we will show represents the Collection class hierarchy.
It defines each class as a box connected with its subclasses. Each box displays
the number of methods and number of instance variables of the represented
class.
view := ROView new.
classElements := ROElement forCollection:
Collection withAllSubclasses.
classElements
do: [ :c |
c width: c model instVarNames size.
c height: c model methods size.
c + ROBorder.
c @ RODraggable ].
view addAll: classElements.
associations := classElements
collect: [:c |
(c model superclass = Object)
ifFalse: [ (view elementFromModel: c
model superclass) -> c]]
thenSelect: [ :assoc | assoc isNil not ].
edges := ROEdge linesFor: associations.
view addAll: edges.
ROTreeLayout new on: view elements.
view open
211
Roassal Easel
2 Note that a Glamour-based easel is also provided, under the Moose section of the World
menu. The Glamour-based Roassal easel is similar to the easel presented here. A dedicated
presentation of this version may be found in the moose book, http://themoosebook.org.
212
The ROMondrianExample category includes examples created with Mondrian, a domain specific language built at on top of Roassal. These examples
primarily use the ROMondrianViewBuilder class to make a visualization. The
ROExample category directly illustrates Roassal.
11.2
213
214
More Elements. Interesting visualizations are likely to contain a large number of elements. Elements may be added either with successive invocations
of add: on a ROView, or in one shot by sending addAll:. Consider:
215
The code above opens a window with two square elements, with the origin at the top left corner. We first create two elements of size 50 and 100,
respectively, and add them to the view using the addAll: message. We make
the two elements with borders and both are draggable. Note that in our example the shape and the interaction are added before opening the view. It
can be done afterwards. Even once added and rendered, graphical components are free to be modified.
An element may be translated by sending translateBy: or translateTo: with
a point as parameter. The parameter representing the step or the position
in pixels. The axes are defined as shown in Figure 11.2, the x-axis increases
from left to right and the y-axis from top to bottom.
view := ROView new.
element1 := ROElement new size: 100.
element2 := ROElement new size: 50.
elements := Array with: element1 with:
element2.
elements do: [ :el | el + ROBorder @
RODraggable ].
view addAll: elements.
element2 translateBy: 150@150.
view open.
216
Each element has a resize strategy stored as resizeStrategy instance variable. By default, the resize strategy is an instance of ROExtensibleParent,
which means a parent will extend its bounds to fit all its child elements.
A number of resize strategies are available; just look for the subclasses of
ROAbstractResizeStrategy class, as its subclasses each define a strategy to be
used by elements.
So far, we have introduced the interactions, the shapes, the child elements, and briefly mentioned the possibility to have an object domain.
Schematically, an element representation looks like Figure 11.3.
217
Translating the views camera. A view also answers to the translateBy: and
translateTo: messages. Even if it looks like it, it is not the view that changes its
position but its camera. The camera component of a view, represented by an
instance of ROCamera, is the point of view from which a visualization object
is actually viewed. More about the camera can be found in Section 11.8
218
11.3
Detailing shapes
Detailing shapes
219
ROElement new
model: 'foo';
size: 100;
+ ROLabel.
ROElement new
size: 100;
+ ROBorder.
ROElement new
size: 200;
+ (ROBox new
color: Color green;
borderColor: Color red;
borderWidth: 4 ).
220
11.4
221
With Roassal it is possible to build links between elements to represent relationships between them. A link between two elements is an instance of the
class ROEdge. By default, an edge is shaped with an instance of RONullShape,
which is the empty shape. Because of this, for an edge to be rendered it needs
to be shaped with a line shape, which can be any subclass of ROAbstractLine.
The following code illustrates the creation of an edge between two elements.
We first create the two elements. We then create the edge using them as parameters and shape it with a line (instance of ROLine) shape. We finally add
the two elements and the edge to the view.
Adding shape to an edge. There are several kinds of line shapes to use
besides the standard one, like ROOrthoHorizontalLineShape. All of them are
subclasses of the ROAbstractLine class, including ROLine. Some examples are
shown in Figure 11.12 and Figure 11.13.
edge + ROLine.
222
edge + ROOrthoHorizontalLineShape.
223
224
Figure 11.18: Adding links between each class and its superclass
Now we have each class in the Collection hierarchy with the shape we
want and connected with each superclass. However we do not see a real
hierarchy. This is because we need an appropriate layout to arrange all the
elements of the view. The next section covers how to apply layouts to elements.
11.5
Layouts
Layouts
225
(a) ROGridLayout
(b) ROCircleLayout
(d) ROTreeMapLayout
(e) ROVerticalLineLayout
(c) ROTreeLayout
(f) ROHorizontalLineLayout
of elements are arranged with two layouts. The first one aligns elements
along a vertical line and the second along a horizontal line. We first create elements for the vertical line, apply the ROVerticalLineLayout and shape
them with a label. We then do the same for the second group, using the
ROHorizontalLineLayout and spacing them to avoid overlapping.
226
Layouts
227
228
classVariableNames: ''
poolDictionaries: ''
category: 'Roassal-Layout'
The instance variable initialPosition defines where the virtual line starts,
which means, where the first element of the line will be located. This variable
is set in an initialize method:
RODiagonalLineLayout >> initialize
super initialize.
initialPosition := 0@0.
RODiagonalLineLayout >> initialPosition: aPoint
initialPosition := aPoint
RODiagonalLineLayout >> initialPosition
^ initialPosition
Layouts
229
230
Figure 11.24: Collection class hierarchy with width representing the number
of instance variables and height the number of methods.
11.6
231
11.7
Some interactions are more complex to set up, like popup elements which
are displayed when the mouse is over an element.
232
From the available interactions in Roassal, only a few examples are presented here.
ROAbstractPopup
ROAbstractPopup allows elements to react to mouse over events by displaying a popup. There are two kinds of popups, (i) ROPopup, which by default displays a box with the printString value of the element model; and (ii)
ROPopupView which displays a custom view.
To add a popup to an element just send the @ message with the ROPopup
class as argument. It is also possible to set up a custom text using the text:
message with a string as parameter.
In the following example, we create an element by sending the spriteOn:
message to the ROElement class, with an arbitrary string as its model. The
resulting element has size 50, a red border and is draggable by the mouse.
We finally add the ROPopup to the element.
view := ROView new.
el := ROElement spriteOn: 'baz'.
el @ ROPopup. "Or with custom text -> (ROPopup text: 'this is custom text')"
view add: el.
view open.
The following example creates a view with five elements. Each one reacts
when the mouse is placed over it by displaying a popup. The popup view is
defined as a block that creates a view with the same number of nodes as the
element model where the mouse is. For example, and as Figure 11.7 shows,
when passing the mouse over the node 3, a popup with three gray boxes
appears.
233
Figure 11.26: ROPopupView that creates a view with the same number of elements as the model of the element the mouse is over.
RODynamicEdge
A recurrent need when visualizing data elements and their relations is showing outgoing edges when the mouse points to an element. Instead of trying
to get the right mixture of callbacks when entering or leaving the element,
the interaction RODynamicEdge considerably eases the task.
The following example makes some lines appear when the mouse hovers
over some elements:
| rawView el1 el2 el3 |
rawView := ROView new.
rawView add: (el1 := ROBox element size: 20).
rawView add: (el2 := ROBox element size: 20).
rawView add: (el3 := ROBox element size: 20).
ROCircleLayout on: (Array with: el1 with: el2 with: el3).
el1 @ RODraggable.
el2 @ RODraggable.
el3 @ RODraggable.
el1 @ (RODynamicEdge toAll: (Array with: el2 with: el3) using: (ROLine arrowed color:
Color red)).
rawView open
234
ROAnimation
Animations are also interactions in Roassal (i.e., ROAnimation is a subclass
of ROInteraction). All animations are subclasses of ROAnimation. Some animations allow elements to be translated either linearly at a constant speed
(ROLinearMove), with an acceleration (ROMotionMove), or following a mathematical function (ROFunctionMove). ROZoomInMove and ROZoomOutMove perform an animation zooming in or out.
Each animation has a number of cycles to complete, executing each one
by sending the doStep message. A ROAnimation also allows one to set a block
to be executed after the animation is finished, using the after: message. It is
important to notice that any action to be carried out after the animation is
finished must be set before the animation is triggered, otherwise it will not
be executed.
view := ROView new.
element := ROElement new.
element size: 10.
element + (ROEllipse color: Color green).
view add: element.
element translateBy: 30@20.
ROFunctionMove new
nbCycles: 360;
blockY: [ :x | (x * 3.1415 / 180) sin * 80 + 50 ];
on: element.
view open.
11.8
A views camera represents the point of view from which the space is actually viewed.
When translateBy: or translateTo: messages are sent to a view, what actually
happens is that its camera moves instead of the view itself. The position
of the camera is given by the position message. The cameras position is set
manually sending the same messages to the camera, translateBy: or translateTo:,
but using negated values as parameters. This means that if the view has to
be translated by 10 pixels horizontally and vertically, we can do it like this:
235
A camera has an extent, which is what we are seeing, and a real extent,
which represents the far extent. The extent of the views camera affects the
way a view is drawn in a canvas. When rendering a view, each point, rectangle or other shape that needs to be drawn will be plotted according to
the cameras extent. This is done by transforming each absolute position in
virtual points relative to the cameras vision. For example, when zooming
in on a view, the content on the extent is stretched to fill the real extent,
which makes objects bigger. The extent and the real extent of the camera are
modified using extent: and realExtent: accessors, respectively. The camera also
stores the window size of the visualization.
The camera has an altitude from the view, which is computed using the
extent. The smaller the extent is, the lower the camera is located, and viceversa. The altitude of the camera can be set by sending the altitude: message
using a number as parameter. A camera cannot be rotated, only translated.
This also means that the camera is always perpendicularly looking at the
view.
Figure 11.28 illustrates what we have just mentioned. It indicates all of
the information regarding the view for which it is associated.We also see that
the visible part of the visualization is given by the cameras extent.
236
237
It opens a view with 400 labelled elements and elements are ordered using a grid layout. Pressing the left mouse button zooms in the view. The
right mouse button zooms out. Pressing the m key will open a minimap.
This feature is enabled using the ROMiniMap interaction.
The ROMiniMap opens a new window that gives a complete vision of a visualization. It also eases the navigation by using the original views camera.
The minimap is composed of a smaller version of the visualization and a
lupa (magnifying glass), which represents the current visible part of the main
views window.
Coming back to our main example, the interaction is simply added by
sending the @ROMiniMap message to a view and pressing m to open it (Figure 11.29).
238
which has a different extent than the views camera. This allows one to see
the same view with different sizes.
The magnifier size represents the visible part of the window and its position is related to the views camera position. When the view is translated
to a point, the magnifier follows it by changing its position: the point representing the camera position is translated to a point on the ROMiniMapDisplayer
camera extent. And when the view is zoomed in or zoomed out the extent
of the camera is changed, increasing or decreasing the magnifiers size.
11.9
Beyond Pharo
The Roassal Core, a set of packages that define all the main classes, like
ROView, ROElement, ROShape and ROCamera. It also contains all the tests.
The Mondrian DSL, composed of the Roassal-Builder and RoassalBuilder-Tests packages.
The platform-dependent packages, which are dedicated to each
Smalltalk dialect Roassal is ported to.
In the platform-dependent packages several classes must be implemented. The main ones are a native canvas class, where a view can be
rendered, and a widget factory class, which can return an object to contain the canvas and receive and delegate all the external events. The first
must be a subclass of ROAbstractCanvas and the second must be subclass of
RONativeWidgetFactory.
The ROPlatform class defines how the bridge between the core and the
dependent packages must be implemented. This class defines instance
variables, like canvasClass and widgetFactory, which store the corresponding
classes to use according to their name. Each platform-dependent package
must implement its own platform class as subclass of ROPlatform and reference all the implemented platform dependent classes. Internally, every
time one of these classes is needed, the core relies on the current instance of
ROPlatform to return the needed class.
Chapter summary
11.10
239
Chapter summary
Roassal enables any graph of objects to be visualized. This chapter has reviewed the main features of Roassal:
Create graphical elements and shape them to look as desired.
Create edges to represent relationships between graphical elements.
Apply layouts to arrange collections of elements automatically.
Make elements react to events by setting callbacks and defined interactions.
Move the visualization point of view, by interacting with its camera.
Screenshots, online example, screencast about Roassal may be found online: http://objectprofile.com.
Acknowledgment. We thank Chris Thorgrimsson and ESUG for supporting the development of Roassal.
We are very grateful to Nicolas Rosselot Urrejola and Stphane Ducasse
for their reviews. We also thank Emmanuel Pietriga and Tudor Girba for the
multiple discussions we had about the design of Roassal.
Chapter 12
12.1
Mondrian is based on Roassal. Check the Roassal chapter for installation procedures. If you are using a Moose distribution of Pharo 2 , then you already
have Roassal.
1 http://themoosebook.org/book/internals/mondrian
2 http://www.moosetechnology.org/
242
A First Visualization
You can get a first visualization by entering and executing the following code
in a workspace. By executing the following in a workspace, you should see
the Collection class hierarchy.
| view |
view := ROMondrianViewBuilder new.
view shape rectangle
width: [ :cls | cls numberOfVariables * 5 ];
height: #numberOfMethods;
linearFillColor: #numberOfLinesOfCode within: Collection withAllSubclasses.
view interaction action: #browse.
view nodes: ROShape withAllSubclasses.
view edgesFrom: #superclass.
view treeLayout.
view open
12.2
243
To define shapes, use the shape message followed by the desired shape
with its characteristics, before the node or nodes definition. This will locally
define the shape for the nodes.
view := ROMondrianViewBuilder new.
view shape rectangle
size: 10;
color: Color red.
view node: 1.
view open.
By using the nodes: message with a collection of objects you can create
several nodes.
244
If the node or nodes have nested nodes, use the node:forIt: or nodes:forEach:
message to add them. The second parameter is a block which will add the
nested nodes, as the following code shows:
245
246
view treeLayout.
view open.
There are essentially two ways to work with Mondrian, either using the
easel or a view renderer. The easel is a tool in which users may interactively
and incrementally build a visualization by means of a script. The easel is particularly useful when prototyping. MOViewRenderer enables a visualization to
be programmatically built, in a non-interactive fashion. You probably want
to use this class when embedding your visualization in your application.
We will first use Mondrian in its easiest way, by using the easel. To open
an easel, you can either use the World menu (it should contain the entry
Mondrian Easel) or execute the expression:
ROEaselMorphic open.
In the easel you have just opened, you can see two panels: the one on
top is the visualization panel, the second one is the script panel. In the script
panel, enter the following code and press the generate button:
view nodes: (1 to: 20).
You should see in the top pane 20 small boxes lined up in the top left
corner. You have just rendered the numerical set between 1 and 20. Each
box represents a number. The amount of interaction you can do is quite
limited for now. You can only drag and drop a number and get a tooltip that
indicates its value. We will soon see how to define interactions. For now, let
us explore the basic drawing capabilities of Mondrian.
We can add edges between nodes that we already drawn. Add a second
line:
view nodes: (1 to: 20).
view edgesFrom: [ :v | v * 2 ].
247
Each number is linked with its double. Not all the doubles are visible.
For example, the double of 20 is 40, which is not part of the visualization. In
that case, no edge is drawn.
The message edgesFrom: defines one edge per node, when possible. For
each node that has been added in the visualization, an edge is defined between this node and a node lookup from the provided block.
Mondrian contains a number of layouts to order nodes. Here, we use the
circle layout:
view nodes: (1 to: 20).
view edgesFrom: [ :v | v * 2 ].
view circleLayout.
12.3
We will now visualize Pharo classes. For the remainder of this section, we
will intensively use the reflective capability of Pharo to introspect the collection class hierarchy. This will serve as compelling examples. Lets visualize
the hierarchy of classes contained in the Collection framework:
view nodes: Collection withAllSubclasses.
248
12.4
Reshaping nodes
Mondrian visualizes graphs of objects. Each object of the domain is associated to a graph element, a node or an edge. Graph elements are not aware
of their graphical representation. Graphical aspect is given by a shape.
So far, we have solely use the default shape to represent node and edges.
The default shape of a node is a five-pixel wide square and the default shape
of an edge is a thin, straight, and gray line.
A number of dimensions defines the appearance of a shape: the width
and the height of a rectangle, the size of a line dash, border and inner colors,
for example. We will reshape the nodes of our visualization to convey more
information about the internal structure of the classes we are visualizing.
Consider:
view shape rectangle
width: [ :each | each instVarNames size * 3 ];
height: #numberOfMethods.
view nodes: Collection withAllSubclasses.
view edgesFrom: #superclass.
view treeLayout.
Figure 12.1 shows the result. Each class is represented as a box. The
Collection class (the root of the hierarchy) is the top most box. The width
Multiple edges
249
Collection, SequentiableCollection, String, CompiledMethod. Classes with more variables than others are: RunArray and SparseLargeTable.
Figure 12.1: The system complexity for the collection class hierarchy.
12.5
Multiple edges
The message edgesFrom: is used to draw one edge at most per node. A variant
of it is edges:from:toAll:. It supports the definition of several edges starting
from a given node. Consider the dependencies between classes. The script:
250
Mondrian provides a set of utility methods to easily create elements. Consider the expression:
Colored shapes
251
itself equivalent to
view
edges: Collection withAllSubclasses
from: [ :each | each superclass ]
to: [ :each | each yourself ].
12.6
Colored shapes
A shape may be colored in various ways. Node shapes understand the message fillColor:, textColor:, borderColor:. Line shapes understand color:. Lets color
the visualization of the collection hierarchy:
view shape rectangle
size: 10;
borderColor: [ :cls | ('*Array*' match: cls name)
ifTrue: [ Color blue ]
ifFalse: [ Color black ] ];
fillColor: [ :cls | cls hasAbstractMethods ifTrue: [ Color lightGray ] ifFalse: [ Color white]
].
view nodes: Collection withAllSubclasses.
view edgesFrom: #superclass.
view treeLayout.
Figure 12.3: Abstract classes are in gray and classes with the word Abstract
in their name are in blue.
252
Similar as with height: and width:, messages to define color either take a
symbol, a block or a constant value as argument. The argument is evaluated
against the domain object represented by the graphical element (a double
dispatch sends the message moValue: to the argument). The use of ifTrue:ifFalse:
is not really practicable. Utilities methods are provided for that purpose to
easily pick a color from a particular condition. The definition of the shape
can simply be:
view shape rectangle
size: 10;
if: [ :cls | ('*Array*' match: cls name) ] borderColor: Color blue;
if: [ :cls | cls hasAbstractMethods ] fillColor: Color lightGray;
...
12.7
More on colors
Colors are pretty useful to designate a property (e.g., gray if the class is abstract). They may also be employed to represent a continuous distribution.
For example, the color intensity may indicate the result of a metric. Consider
the previous script in which the node color intensity conveys the number of
lines of code:
view interaction action: #browse.
view shape rectangle
width: [ :each | each instVarNames size * 3 ];
height: [ :each | each methods size ];
linearFillColor: #numberOfLinesOfCode within: Collection withAllSubclasses.
view nodes: Collection withAllSubclasses.
view edgesFrom: #superclass.
view treeLayout.
Popup view
253
Figure 12.4: The system complexity visualization: nodes are classes; height
is the number of lines of methods; width the number of variables; color conveys about the number of lines of code.
A color may be assigned to an object identity using identityFillColorOf:. The
argument is either a block or a symbol, evaluated against the domain object.
A color is associated with the result of the argument.
12.8
Popup view
Lets jump back on the abstract class example. The following script indicates
abstract classes and how many abstract methods they define:
254
Figure 12.5: Boxes are classes and links are inheritance relationships. The
amount of abstract method is indicated by the size of the class. A red class
defines abstract methods and a pink class solely inherits from an abstract
class.
Figure 12.5 indicates classes that are abstract either by inheritance or by
defining abstract methods. Class size indicates the amount of abstract methods defined.
The popup message can be enhanced to list abstract methods. Putting the
mouse above a class does not only give its name, but also the list of abstract
methods defined in the class. The following piece of code has to be added at
the beginning:
view interaction popupText: [ :aClass |
| stream |
stream := WriteStream on: String new.
(aClass methods select: #isAbstract thenCollect: #selector)
do: [:sel | stream nextPutAll: sel; nextPut: $ ; cr].
aClass name printString, ' => ', stream contents ].
...
So far, we have seen that an element has a shape to describe its graphical
representation. It also contains an interaction that contains event handlers.
The message popupText: takes a block as argument. This block is evaluated
with the domain object as argument. The block has to return the popup text
content. In our case, it is simply a list of the methods.
In addition to a textual content, Mondrian allows a view to be popped
up. We will enhance the previous example to illustrate this point. When the
mouse enters a node, a new view is defined and displayed next to the node.
view interaction popupView: [ :element :secondView |
Subviews
255
12.9
Subviews
256
Figure 12.6: Large boxes are classes. Inner boxes are methods. Edges show a
possible invocation between the two.
12.10
Forwarding events
Events
12.11
257
Events
Each mouse movement, click and keyboard keystroke corresponds to a particular event. Mondrian offers a rich hierarchy of events. The root of the
hierarchy is MOEvent. To associate a particular action to an event, a handler
has to be defined on the object interaction. In the following example, clicking
on a class opens a code browser:
view shape rectangle
width: [ :each | each instVarNames size * 5 ];
height: [ :each | each methods size ];
if: #hasAbstractMethods fillColor: Color lightRed;
if: [:cls | cls methods anySatisfy: #isAbstract ] fillColor: Color red.
view interaction on: ROMouseClick do: [ :event | event model browse ].
view nodes: Collection withAllSubclasses.
view edgesFrom: #superclass.
view treeLayout.
The block handler accepts one argument: the event generated. The object that triggered the event is obtained by sending modelElement to the event
object.
12.12
Interaction
Mondrian offers a number of contextual interaction mechanisms. The interaction object contains a number of keywords for that purpose. The message
highlightWhenOver: takes a block as argument. This block returns a list of the
nodes to highlight when the mouse enters a node. Consider the example:
view interaction
highlightWhenOver: [:v | {v - 1 . v + 1. v + 4 . v - 4}].
view shape rectangle
width: 40;
height: 30;
withText.
view nodes: (1 to: 16).
view gridLayout gapSize: 2.
258
hand size a hierarchy of unit tests is displayed. Locating the mouse pointer
above a unit test highlights the classes that are referenced by one of the unit
test methods. Consider the (rather long) script:
The script contains two parts. The first part is the ubiquitous system
complexity of the collection framework. The second part renders the tests
contained in the CollectionsTests. The width of a class is the number of literals contained in it. The height is the number of lines of code. Since the
collection tests makes a great use of traits to reuse code, these metrics have
to be scaled down. When the mouse is placed over a test unit, then all the
classes of the collection framework referenced in this class are highlighted.
Chapter summary
259
12.13
Chapter summary
260
Part IV
Language
Chapter 13
Handling Exceptions
with the participation of:
Clment Bera (bera.clement@gmail.com)
All applications have to deal with exceptional situations. Arithmetic errors may occur (such as division by zero), unexpected situations may arise
(file not found), or resources may be exhausted (network down, disk full,
etc.). The old-fashioned solution is to have operations that fail return a special error code; this means that client code must check the return value of each
operation, and take special action to handle errors. This leads to brittle code.
With the help of a series of examples, we shall explore all of these possibilities, and take a closer look into the internal mechanics of exceptions and
exception handlers.
13.1
Introduction
264
Handling Exceptions
two things: it captures essential information about the context in which the
exception occurred, and transfers control to the exception handler, written
by the client, which decides what to do about it. The essential information
about the context is saved in an Exception object; various classes of Exception
are specified to cover the varied exceptional situations that may arise.
Pharos exception-handling mechanism is particularly expressive and
flexible, covering a wide range of possibilities. Exception handlers can be
used to ensure that certain actions take place even if something goes wrong,
or to take action only if something goes wrong. Like everything in Smalltalk,
exceptions are objects, and respond to a variety of messages. When an exception is caught by a handler, there are many possible responses: the handler
can specify an alternative action to perform; it can ask the exception object
to resume the interrupted operation; it can retry the operation; it can pass the
exception to another handler; or it can reraise a completely different exception.
13.2
Ensuring execution
The ensure: message can be sent to a block to make sure that, even if the block
fails (e.g., raises an exception) the argument block will still be executed:
anyBlock ensure: ensuredBlock
This code ensures that the writer file handle will be closed, even if an error
occurs in Form fromUser or while writing to the file.
Here is how it works in more detail. The nextPutImage: method of the class
GIFReadWriter converts a form (i.e., an instance of the class Form, representing
a bitmap image) into a GIF image. This method writes into a stream which
has been opened on a file. The nextPutImage: method does not close the stream
it is writing to, therefore we should be sure to close the stream even if a problem arises while writing. This is achieved by sending the message ensure: to
the block that does the writing. In case nextPutImage: fails, control will flow
into the block passed to ensure:. If it does not fail, the ensured block will still
be executed. So, in either case, we can be sure that writer is closed.
Here is another use of ensure:, in class Cursor:
265
CursorshowWhile: aBlock
"While evaluating the argument, aBlock,
make the receiver be the cursor shape."
| oldcursor |
oldcursor := Sensor currentCursor.
self show.
^aBlock ensure: [ oldcursor show ]
13.3
"not 0"
Open a transcript and evaluate the code above in a workspace. When the predebugger windows opens, first try selecting Proceed and then Abandon . Note that
the argument to ifCurtailed: is evaluated only when the receiver terminates abnormally. What happens when you select Debug ?
Here are some examples of ifCurtailed: usage: the text of the Transcript show:
describes the situation:
[^ 10] ifCurtailed: [Transcript show: 'This is displayed'; cr]
266
Handling Exceptions
Both ensure: and ifCurtailed: are very useful for making sure that important
cleanup code is executed, but are not by themselves sufficient for handling
all exceptional situations. Now lets look at a more general mechanism for
handling exceptions.
13.4
Exception handlers
The general mechanism is provided by the message on:do:. It looks like this:
aBlock on: exceptionClass do: handlerAction
aBlock is the code that detects an abnormal situation and signals an exception; called the protected block. handlerAction is the block that is evaluated if an
exception is signaled and called the exception handler. exceptionClass defines
the class of exceptions that handlerAction will be asked to handle.
The message on:do: returns the value of the receiver (the protected block)
and when an error occurs it returns the value of the handlerAction block as
illustrated by the following expressions:
Exception handlers
267
The beauty of this mechanism lies in the fact that the protected block can
be written in a straightforward way, without regard to any possible errors. A
single exception handler is responsible for taking care of anything that may
go wrong.
Consider the following example where we want to copy the contents of
one file to another. Although several file-related things could go wrong, with
exception handling, we simply write a straight-line method, and define a
single exception handler for the whole transaction:
| source destination fromStream toStream |
source := 'log.txt'.
destination := 'log-backup.txt'.
[ fromStream := FileStream oldFileNamed: (FileSystem workingDirectory / source).
[ toStream := FileStream newFileNamed: (FileSystem workingDirectory / destination).
[ toStream nextPutAll: fromStream contents ]
ensure: [ toStream close ] ]
ensure: [ fromStream close ] ]
on: FileStreamException
do: [ :ex | UIManager default inform: 'Copy failed -- ', ex description ].
268
Handling Exceptions
If any exception other than FileStreamException happens, the files are not
properly closed.
13.5
Without exceptions, one (bad) way to handle a method that may fail to produce an expected result is to introduce explicit error codes as possible return
values. In fact, in languages like C, code is littered with checks for such error codes, which often obscure the main application logic. Error codes are
also fragile in the face of evolution: if new error codes are added, then all
clients must be adapted to take the new codes into account. By using exceptions instead of error codes, the programmer is freed from the task of explicitly checking each return value, and the program logic stays uncluttered.
Moreover, because exceptions are classes, as new exceptional situations are
discovered, they can be subclassed; old clients will still work, although they
may provide less-specific exception handling than newer clients.
If Smalltalk did not provide exception-handling support, then the tiny
example we saw in the previous section would be written something like
this, using error codes:
"Pseudo-code -- luckily Smalltalk does not work like this. Without the
benefit of exception handling we must check error codes for each operation."
source := 'log.txt'.
destination := 'log-backup.txt'.
success := 1. "define two constants, our error codes"
failure := 0.
fromStream := FileStream oldFileNamed: (FileSystem workingDirectory / source).
fromStream ifNil: [
UIManager default inform: 'Copy failed -- could not open', source.
^ failure "terminate this block with error code" ].
toStream := FileStream newFileNamed: (FileSystem workingDirectory / destination).
toStream ifNil: [
fromStream close.
UIManager default inform: 'Copy failed -- could not open', destination.
269
What a mess! Without exception handling, we must explicitly check the result of each operation before proceeding to the next. Not only must we check
error codes at each point that something might go wrong, but we must also
be prepared to cleanup any operations performed up to that point and abort
the rest of the code.
13.6
270
Handling Exceptions
If you are wondering how this works, have a look at the implementation
of Exception class,
Exception class, anotherException
"Create an exception set."
^ExceptionSet new add: self; add: anotherException; yourself
The rest of the magic occurs in the class ExceptionSet, which has a surprisingly simple implementation.
Object subclass: #ExceptionSet
instanceVariableNames: 'exceptions'
classVariableNames: ''
poolDictionaries: ''
Signaling an exception
271
category: 'Exceptions-Kernel'
ExceptionSetinitialize
super initialize.
exceptions := OrderedCollection new
ExceptionSet, anException
self add: anException.
^self
ExceptionSetadd: anException
exceptions add: anException
ExceptionSethandles: anException
exceptions do: [:ex | (ex handles: anException) ifTrue: [^true]].
^false
13.7
Signaling an exception
272
Handling Exceptions
with the exception as its sole argument. We will see shortly some of the ways
in which the handler can use the exception object.
When signaling an exception, it is possible to provide information specific to the situation just encountered, as illustrated in the code below. For
example, if the file to be opened does not exist, the name of the non-existent
file can be recorded in the exception object:
StandardFileStream classoldFileNamed: fileName
"Open an existing file with the given name for reading and writing. If the name has no
directory part, then default directory will be assumed. If the file does not exist, an
exception will be signaled. If the file exists, its prior contents may be modified or
replaced, but the file will not be truncated on close."
| fullName |
fullName := self fullName: fileName.
^(self isAFileNamed: fullName)
ifTrue: [self new open: fullName forWrite: true]
ifFalse: ["File does not exist..."
(FileDoesNotExistException new fileName: fullName) signal]
The exception handler may make use of this information to recover from
the abnormal situation. The argument ex in an exception handler [:ex | ...] will
be an instance of FileDoesNotExistException or of one of its subclasses. Here the
exception is queried for the filename of the missing file by sending it the
message fileName.
| result |
result := [(StandardFileStream oldFileNamed: 'error42.log') contentsOfEntireFile]
on: FileDoesNotExistException
do: [:ex | ex fileName , ' not available'].
Transcript show: result; cr
Every exception has a default description that is used by the development tools to report exceptional situations in a clear and comprehensible
manner. To make the description available, all exception objects respond to
the message description. Moreover, the default description can be changed by
sending the message messageText: aDescription, or by signaling the exception
using signal: aDescription.
Another example of signaling occurs in the doesNotUnderstand: mechanism, a pillar of the reflective capabilities of Smalltalk. Whenever an object is
sent a message that it does not understand, the VM will (eventually) send it
the message doesNotUnderstand: with an argument representing the offending
message. The default implementation of doesNotUnderstand:, defined in class
Object, simply signals a MessageNotUnderstood exception, causing a debugger
to be opened at that point in the execution.
The doesNotUnderstand: method illustrates the way in which exceptionspecific information, such as the receiver and the message that is not un-
Finding handlers
273
derstood, can be stored in the exception, and thus made available to the
debugger.
ObjectdoesNotUnderstand: aMessage
"Handle the fact that there was an attempt to send the given message to the receiver
but the receiver does not understand this message (typically sent from the machine
when a message is sent to the receiver and no method is defined for that selector).
"
MessageNotUnderstood new
message: aMessage;
receiver: self;
signal.
^ aMessage sentTo: self.
That completes our description of how exceptions are used. The remainder of this chapter discusses how exceptions are implemented and adds
some details that are relevant only if you define your own exceptions.
13.8
Finding handlers
We will now take a look at how exception handlers are found and fetched
from the execution stack when an exception is signaled. However, before
we do this, we need to understand how the control flow of a program is
internally represented in the virtual machine.
At each point in the execution of a program, the execution stack of the
program is represented as a list of activation contexts. Each activation context represents a method invocation and contains all the information needed
for its execution, namely its receiver, its arguments, and its local variables. It
also contains a reference to the context that triggered its creation, i.e., the activation context associated with the method execution that sent the message
that created this context. In Pharo, the class MethodContext (whose superclass
is ContextPart) models this information. The references between activation
contexts link them into a chain: this chain of activation contexts is Smalltalks
execution stack.
Suppose that we attempt to open a FileStream on a non-existent file from
a doIt. A FileDoesNotExistException will be signaled, and the execution stack
will contain MethodContexts for doIt, oldFileNamed:, and signal, as shown in Figure 13.2.
Since everything is an object in Smalltalk, we would expect method contexts to be objects. However, some Smalltalk implementations use the native
C execution stack of the virtual machine to avoid creating objects all the time.
The current Pharo virtual machine does actually use full Smalltalk objects all
the time; for speed, it recycles old method context objects rather than creating a new one for each message-send.
274
Handling Exceptions
Handling exceptions
275
Without the second handler, the nested exception will not be caught, and
the debugger will be invoked.
An alternative would be to specify the second handler within the first
one:
result := [ Error signal: 'error 1' ]
on: Exception
do: [[ Error signal: 'error 2' ]
on: Exception
do: [:ex | ex description ]].
result 'Error: error 2'
13.9
Handling exceptions
When an exception is signaled, the handler has several choices about how to
handle it. In particular, it may:
(i) abandon the execution of the protected block by simply specifying an
alternative result it is part of the protocol but not used since it is
similar to return;
(ii) return an alternative result for the protected block by sending return:
aValue to the exception object;
276
Handling Exceptions
(iii) retry the protected block, by sending retry, or try a different block by
sending retryUsing:;
(iv) resume the protected block at the failure point by sending resume or
resume:;
(v) pass the caught exception to the enclosing handler by sending pass; or
(vi) resignal a different exception by sending resignalAs: to the exception.
We will briefly look at the first three possibilities, and then take a closer
look at the remaining ones.
The handler takes over from the point where the error is signaled, and
any code following in the original block is not evaluated.
The ANSI standard is not clear regarding the difference between using
do: [:ex | 100 ] and do: [:ex | ex return: 100] to return a value. We suggest that you
use return: since it is more intention-revealing, even if these two expressions
Handling exceptions
277
42
The message retryUsing: aNewBlock enables the protected block to be replaced by aNewBlock. This new block is executed and is protected with the
same handler as the original block.
x := 0.
result := [ x/x ] "fails for x=0"
on: Error
do: [:ex |
x := x + 1.
ex retryUsing: [1/((x-1)*(x-2))] "fails for x=1 and x=2"
].
result (1/2) "succeeds when x=3"
As another example, keep in mind the file handling code we saw earlier
in which we printed a message to the Transcript when a file is not found.
Instead, we could prompt for the file as follows:
278
Handling Exceptions
Resuming execution
A method that signals an exception that isResumable can be resumed at the
place immediately following the signal. An exception handler may therefore
perform some action, and then resume the execution flow. This behavior is
achieved by sending resume: to the exception in the handler. The argument
is the value to be used in place of the expression that signaled the exception.
In the following example we signal and catch MyResumableTestError, which is
defined in the Tests-Exceptions category:
result := [ | log |
log := OrderedCollection new.
log addLast: 1.
log addLast: MyResumableTestError signal.
log addLast: 2.
log addLast: MyResumableTestError signal.
log addLast: 3.
log ]
on: MyResumableTestError
do: [ :ex | ex resume: 0 ].
result an OrderedCollection(1 0 2 0 3)
Here we can clearly see that the value of MyResumableTestError signal is the
value of the argument to the resume: message.
The message resume is equivalent to resume: nil.
The usefulness of resuming an exception is illustrated by the following
functionality which loads a package. When installing packages, warnings
may be signaled and should not be considered fatal errors, so we should
simply ignore the warning and continue installing.
The class PackageInstaller does not exist, though here is a sketch of a possible implementation.
PackageInstallerinstallQuietly: packageNameCollection
....
[ self install ] on: Warning do: [ :ex | ex resume ].
Handling exceptions
279
ResumableLoaderreadOptionsFrom: aStream
| option |
[aStream atEnd]
whileFalse: [option := self parseOption: aStream.
"nil if invalid"
option isNil
ifTrue: [InvalidOption signal]
ifFalse: [self addOption: option]].
Note that to be sure to close the stream, the stream close should guarded
by an ensure: invocation.
Depending on user input, the handler in readConfiguration might return
nil, or it might resume the exception, causing the signal message send in
readOptionsFrom: to return and the parsing of the options stream to continue.
Note that InvalidOption must be resumable; it suffices to define it as a subclass of Exception.
You can have a look at the senders of resume: to see how it can be used.
Passing exceptions on
To illustrate the remaining possibilities for handling exceptions such as passing an exception, we will look at how to implement a generalization of the
perform: method. If we send perform: aSymbol to an object, this will cause the
message named aSymbol to be sent to that object:
5 perform: #factorial
120
These perform:-like methods are very useful for accessing an interface dynamically, since the messages to be sent can be determined at run-time. One
280
Handling Exceptions
However, there is a complication. There might be a selector in the collection that the object does not understand (such as #activate). We would like
to ignore such selectors and continue sending the remaining messages. The
following implementation seems to be reasonable:
ObjectperformAll: selectorCollection
selectorCollection do: [:each |
[self perform: each]
on: MessageNotUnderstood
do: [:ex | ex return]] "also ignores internal errors"
This has the effect of passing on MessageNotUnderstood errors to the surrounding context when they are not part of the list of messages we are performing. The pass message will pass the exception to the next applicable
handler in the execution stack.
If there is no next handler on the stack, the defaultAction message is sent
to the exception instance. The pass action does not modify the sender chain
in any way but the handler that controls it to may do so. Like the other
messages discussed in this section, pass is special it never returns to the
sender.
281
The goal of this section has been to demonstrate the power of exceptions.
It should be clear that while you can do almost anything with exceptions, the
code that results is not always easy to understand. There is often a simpler
way to get the same effect without exceptions; see method 13.2 on page 289
for a better way to implement performAll:.
Resending exceptions
Suppose that in our performAll: example we no longer want to ignore selectors
not understood by the receiver, but instead we want to consider an occurrence of such a selector as an error. However, we want it to be signaled as an
application-specific exception, lets say InvalidAction, rather than the generic
MessageNotUnderstood. In other words, we want the ability to resignal a
signaled exception as a different one.
It might seem that the solution would simply be to signal the new exception in the handler block. The handler block in our implementation of
performAll: would be:
[:ex | (ex receiver == self and: [ex message selector == each])
ifTrue: [InvalidAction signal] "signals from the wrong context"
ifFalse: [ex pass]]
13.10
The ANSI protocol also specifies the outer behavior. The method outer is very
similar to pass. Sending outer to an exception also evaluates the enclosing
handler action. The only difference is that if the outer handler resumes the
282
Handling Exceptions
exception, then control will be returned to the point where outer was sent,
not the original point where the exception was signaled:
passResume := [[ Warning signal . 1 ] "resume to here"
on: Warning
do: [ :ex | ex pass . 2 ]]
on: Warning
do: [ :ex | ex resume ].
passResume 1 "resumes to original signal point"
outerResume := [[ Warning signal . 1 ]
on: Warning
do: [ :ex | ex outer . 2 ]] "resume to here"
on: Warning
do: [ :ex | ex resume ].
outerResume 2 "resumes to where outer was sent"
13.11
Now that we saw how exceptions work, we present the interplay between
exceptions and the ensure: and ifCurtailed: semantics. Exception handlers are
executed then ensure: or ifCurtailed: blocks are executed. ensure: argument is
always executed while ifCurtailed: argument is only executed when its receiver
execution led to an unwound stack.
The following example shows such behavior. It prints: should show first
error followed by then should show curtailed and returns 4.
[[ 1/0 ]
ifCurtailed: [ Transcript show: 'then should show curtailed'; cr. 6 ]]
on: Error do: [ :e |
Transcript show: 'should show first error'; cr.
e return: 4 ].
First the [1/0] raises a division by zero error. This error is handled by the
exception handler. It prints the first message. Then it returns the value 4 and
since the receiver raised an error, the argument of the ifCurtailed: message is
evaluated: it prints the second message. Note that ifCurtailed: does not change
the return value expressed by the error handler or the ifCurtailed: argument.
The following expression shows that when the stack is not unwound the
expression value is simply returned and none of the handlers are executed.
1 is returned.
[[ 1 ]
283
The following expression shows that when an error occurs the handler
associated with the error is executed before the ensure: argument. Here the
expression prints should show error first, then then should show ensure and it returns 4.
[[ 1/0 ]
ensure: [ Transcript show: 'then should show ensure'; cr. 6 ]]
on: Error do: [ :e |
Transcript show: 'should show error first'; cr.
e return: 4 ].
Finally the last expression shows that errors are executed one by one from
the closest to the farthest from the error, then the ensure: argument. Here
error1, then error2, and then then should show ensure are displayed.
[[[ 1/0 ] ensure: [ Transcript show: 'then should show ensure'; cr. 6 ]]
on: Error do: [ :e|
Transcript show: 'error 1'; cr.
e pass ]] on: Error do: [ :e |
Transcript show: 'error 2'; cr. e return: 4 ].
284
13.12
Handling Exceptions
Example: Deprecation
Deprecation offers a case study of a mechanism built using resumable exceptions. Deprecation is a software re-engineering pattern that allows us to
mark a method as being deprecated, meaning that it may disappear in
a future release and should not be used by new code. In Pharo, a method
can be marked as deprecated as follows:
Utilities classconvertCRtoLF: fileName
"Convert the given file to LF line endings. Put the result in a file with the extention '.lf'"
self deprecated: 'Use ''FileStream convertCRtoLF: fileName'' instead.'
on: '10 July 2009' in: #Pharo1.0 .
FileStream convertCRtoLF: fileName
285
13.13
286
Handling Exceptions
Objecthalt
"This is the typical message to use for inserting breakpoints during
debugging. It behaves like halt:, but does not call on halt: in order to
avoid putting this message on the stack. Halt is especially useful when
the breakpoint message is an arbitrary one."
Halt signal
Halt is a direct subclass of Exception. A Halt exception is resumable, which
means that it is possible to continue execution after a Halt is signaled.
Halt overrides the defaultAction method, which specifies the action to perform if the exception is not caught (i.e., there is no exception handler for Halt
anywhere on the execution stack):
HaltdefaultAction
"No one has handled this error, but now give them a chance to decide
how to debug it. If no one handles this then open debugger
(see UnhandedError-defaultAction)"
UnhandledError signalForException: self
This code signals a new exception, UnhandledError, that conveys the idea
that no handler is present. The defaultAction of UnhandledError is to open a debugger:
UnhandledErrordefaultAction
"The current computation is terminated. The cause of the error should be logged or
reported to the user. If the program is operating in an interactive debugging
environment the computation should be suspended and the debugger activated."
^ UIManager default unhandledErrorDefaultAction: self exception
MorphicUIManagerunhandledErrorDefaultAction: anException
^ Smalltalk tools debugError: anException.
13.14
Specific exceptions
The class Exception in Pharo has ten direct subclasses, as shown in Figure 13.4.
The first thing that we notice from this figure is that the Exception hierarchy
Specific exceptions
287
is a bit of a mess; you can expect to see some of the details change as Pharo
is improved.
true
288
Handling Exceptions
79
If you declare a new subclass of exceptions, you should look in its protocol
for the isResumable method, and override it as appropriate to the semantics
of your exception.
In some situations, it will never make sense to resume an exception. In
such a case you should signal a non-resumable subclass either an existing
one or one of your own creation. In other situations, it will always be OK
to resume an exception, without the handler having to do anything. In fact,
this gives us another way of characterizing a notification: a Notification is a
resumable Exception that can be safely resumed without first modifying the
state of the system. More often, it will be safe to resume an exception only
if the state of the system is first modified in some way. So, if you signal a
resumable exception, you should be very clear about what you expect an
exception handler to do before it resumes the exception.
When defining a new exception. It is difficult to decide when it is worth
defining a new exception instead of reusing an existing one. Here are some
heuristics: you should evaluate whether
you can have an adequate solution to the exceptional situation,
you need a specific default behavior when the exceptional situation is
not handled, and
if you need to store more information to handle the exception case.
13.15
Just because Pharo has exception handling, you should not conclude that it
is always appropriate to use. As stated in the introduction to this chapter, we
said that exception handling is for exceptional situations. Therefore, the first
rule for using exceptions is not to use them for situations that can reasonably
be expected to occur in a normal execution.
Of course, if you are writing a library, what is normal depends on the
context in which your library is used. To make this concrete, lets look at
Dictionary as an example: aDictionary at: aKey will signal an Error if aKey is not
present. But, you should not write a handler for this error! If the logic of
your application is such that there is some possibility that the key will not be
in the dictionary, then you should instead use at: aKey ifAbsent: [remedial action].
Exceptions implementation
289
In fact, Dictionaryat: is implemented using Dictionaryat:ifAbsent:. aCollection detect: aPredicateBlock is similar: if there is any possibility that the predicate
might not be satisfied, you should use aCollection detect: aPredicateBlock ifNone:
[remedial action].
When you write methods that signal exceptions, consider whether you
should also provide an alternative method that takes a remedial block as an
additional argument, and evaluates it if the normal action cannot be completed. Although this technique can be used in any programming language
that supports closures, because Smalltalk uses closures for all its control
structures, it is a particularly natural one to use in Smalltalk.
Another way of avoiding exception handling is to test the precondition
of the exception before sending the message that may signal it. For example,
in method 13.1, we sent a message to an object using perform:, and handled
the MessageNotUnderstood error that might ensue. A much simpler alternative
is to check to see if the message is understood before executing the perform:
Method 13.2: ObjectperformAll: revisited
performAll: selectorCollection
selectorCollection
do: [:each | (self respondsTo: each)
ifTrue: [self perform: each]]
13.16
Exceptions implementation
Up to now, we have presented the use of exceptions without really explaining in depth how they are implemented. Note that since you do not need
to know how exceptions are implemented to use them, you can simply skip
this section on the first reading. Now if you are curious and really want to
know how they are implemented, this section is for you.
The mechanism is quite simple, making it worthwhile to know how it
operates. On the contrary to most mainstream languages, exceptions are implemented in the language side without virtual machine support, using the
reification of the runtime stack as a linked list of contexts (method or closure
activation record). Lets have a look at how exceptions are implemented and
use contexts to store their information.
290
Handling Exceptions
There is no instance variable here to store the exception class or the handler, nor is there any place in the superclass to store them. However, note
that MethodContext is defined as a variableSubclass. This means that in addition
to the named instance variables, instances of this class have some indexed
slots. Every MethodContext has indexed slots, that are used to store, among
Exceptions implementation
291
In the protected block, we query the context that represents the protected
block execution using thisContext sender. This execution was triggered by the
on:do: message execution. The last line explores a 2-element array that contains the exception class and the exception handler.
If you get some strange results using halt and inspect inside the protected
block, note that as the method is being executed, the state of the context object changes, and when the method returns, the context is terminated, setting
to nil several of its fields. Opening an explorer on thisContext will show you
that the context sender is effectively the execution of the method on:do:.
Note that you can also execute the following code:
[thisContext sender explore] on: Error do: [:ex|].
You obtain an explorer and you can see that the exception class and the
handler are stored in the first and second variable instance variables of the
method context object (a method context represents an execution stack element).
We see that on:do: execution stores the exception class and its handler on
the method context. Note that this is not specific to on:do: but any message
execution stores arguments on its corresponding context.
Finding Handlers. Now that we know where the information is stored,
lets have a look at how it is found at runtime.
As discussed before, the primitive 199 (the one used by on:do:) always fails!
As the primitive always fails, the Smalltalk body of on:do: is always executed.
However, the presence of the <primitive: 199> is used as a marker.
The
source
code
of
the
primitive
is
found
in
Interpreter
292
Handling Exceptions
Figure 13.5: Explore a method context to find the exception class and the
handler.
primitiveMarkHandlerMethod
"Primitive. Mark the method for exception handling. The primitive must fail after
marking the context so that the regular code is run."
self inline: false.
^self primitiveFail
Now we know that the context corresponding to the method on:do: is marked
and a context has a direct reference through an instance variable to the
method it has activated. Therefore, we can know if the context is an exception handler by checking if the method it has activated holds primitive 199.
Thats whats the method isHandlerContext is doing (code below).
MethodContextisHandlerContext
"is this context for method that is marked?"
^method primitive = 199
Exceptions implementation
293
ContextPartnextHandlerContext
^ self sender findNextHandlerContextStarting
Since the method context supplied by findNextHandlerContextStarting contains all the exception-handling information, it can be examined to see if
the exception class is suitable for handling the current exception. If so, the
associated handler can be executed; if not, the look-up can continue further.
This is all implemented in the handleSignal: method.
ContextParthandleSignal: exception
"Sent to handler (on:do:) contexts only. If my exception class (first arg) handles
exception then execute my handle block (second arg), otherwise forward this
message to the next handler context. If none left, execute exception's defaultAction
(see nil>>handleSignal:)."
294
Handling Exceptions
| value |
((self exceptionClass handles: exception)
and: [self exceptionHandlerIsActive])
ifFalse: [ ^ self nextHandlerContext handleSignal: exception ].
exception privHandlerContext: self contextTag.
"disable self while executing handle block"
self exceptionHandlerIsActive: false.
value := [ self exceptionHandlerBlock cull: exception ]
ensure: [ self exceptionHandlerIsActive: true ].
"return from self if not otherwise directed in handle block"
self return: value.
ContextPartexceptionClass
"handlercontext only. access temporaries from BlockClosure>>#on:do:"
^self tempAt: 1
exceptionHandlerBlock
"handlercontext only. access temporaries from BlockClosure>>#on:do:"
^self tempAt: 2
exceptionHandlerIsActive
"handlercontext only. access temporaries from BlockClosure>>#on:do:"
^self tempAt: 3
exceptionHandlerIsActive: aBoolean
"handlercontext only. access temporaries from BlockClosure>>#on:do:"
self tempAt: 3 put: aBoolean
Notice how this method uses tempAt: 1 to access the exception class, and
ask if it handles the exception. What about tempAt: 3? That is the temporary
variable handlerActive of the on:do: method. Checking that handlerActive is true
and then setting it to false ensures that a handler will not be asked to handle
an exception that it signals itself. The return: message sent as the final action
of handleSignal is responsible for unwinding the stack, i.e., removing all the
context between the exception signaler context and its exception handler as
well as executing unwind blocks (blocks created with ensure).
To summarize, the signal method, with optional assistance from the virtual machine for performance, finds the context that correspond to an on:do:
message with an appropriate exception class. Because the execution stack
is made up of a linked list of Context objects that may be manipulated just
like any other object, the stack can be shortened at any time. This is a superb
example of flexibility of Pharo.
Ensure:s implementation
13.17
295
Ensure:s implementation
The <primitive: 198 > works the same way as the <primitive: 199 > we saw in
the previous section. It always fails, however, its presence marks the method
in way that can easily be detected from the context activating this method.
Moreover, the unwind block is stored the same way as the exception class
and its associated handler. More explicitly, it is stored in the context of ensure:
method execution, that can be accessed from the block through thisContext
sender tempAt: 1.
In the case where the block does not fail and does not have a non-local return, the ensure: message implementation executes the block, stores the result
in the returnValue variable, executes the argument block and lastly returns the
result of the block previously stored. The complete variable is here to prevent
the argument block from being executed twice.
Ensuring a failing block. The ensure: message will execute the argument
block even if the block fails. In the following example, the ensureWithOnDo
message returns 2 and executes 1. In the subsequent section we will carefully
look at where and what the block is actually returning and in which order
the blocks are executed.
Bexp>>ensureWithOnDo
^[ [ Error signal ] ensure: [ 1 ].
^3 ] on: Error do: [ 2 ]
296
Handling Exceptions
Bexp>>mainBlock
^[ self traceCr: 'mainBlock start'.
self failingBlock ensure: self ensureBlock.
self traceCr: 'mainBlock end' ]
Bexp>>failingBlock
^[ self traceCr: 'failingBlock start'.
Error signal.
self traceCr: 'failingBlock end' ]
Bexp>>ensureBlock
^[ self traceCr: 'ensureBlock value'.
#EnsureBlockValue ]
Bexp>>exceptionHandlerBlock
^[ self traceCr: 'exceptionHandlerBlock value'.
#ExceptionHandlerBlockValue ]
Bexp>>start
| res |
self traceCr: 'start start'.
res := self mainBlock on: Error do: self exceptionHandlerBlock.
self traceCr: 'start end'.
self traceCr: 'The result is : ', res, '.'.
^ res
Executing Bexp new start prints the following (we added indentation to
stress the calling flow).
start start
mainBlock start
failingBlock start
exceptionHandlerBlock value
ensureBlock value
start end
The result is: ExceptionHandlerBlockValue.
There are three important things to see. First, the failing block and the
main block are not fully executed because of the signal message. Secondly, the
exception block is executed before the ensure block. Lastly, the start method
will return the result of the exception handler block.
To understand how this works, we have to look at the end of the exception implementation. We finish the previous explanation on the handleSignal
method.
ContextParthandleSignal: exception
"Sent to handler (on:do:) contexts only. If my exception class (first arg) handles
exception then execute my handle block (second arg), otherwise forward this
Ensure:s implementation
297
message to the next handler context. If none left, execute exception's defaultAction
(see nil>>handleSignal:)."
| value |
((self exceptionClass handles: exception)
and: [self exceptionHandlerIsActive])
ifFalse: [ ^ self nextHandlerContext handleSignal: exception ].
exception privHandlerContext: self contextTag.
"disable self while executing handle block"
self exceptionHandlerIsActive: false.
value := [ self exceptionHandlerBlock cull: exception ]
ensure: [ self exceptionHandlerIsActive: true ].
"return from self if not otherwise directed in handle block"
self return: value.
In our example, Pharo will execute the failing block, then will look for
the next handler context, marked with <primitive: 199 >. As a regular exception, Pharo finds the exception handler context, and runs the exceptionHandlerBlock. The method handleSignal finishes with the return: method. Lets
have a look into it.
ContextPart>>return: value
"Unwind thisContext to self and return value to self's sender. Execute any unwind
blocks while unwinding. ASSUMES self is a sender of thisContext"
sender ifNil: [self cannotReturn: value to: sender].
sender resume: value
The return: message will check if the context has a sender, and, if not, send
a CannotReturn Exception. Then the sender of this context will call the resume:
message.
resume: value
"Unwind thisContext to self and resume with value as result of last send. Execute
unwind blocks when unwinding. ASSUMES self is a sender of thisContext"
self resume: value through: (thisContext findNextUnwindContextUpTo: self)
ContextPart>>resume: value through: firstUnwindContext
"Unwind thisContext to self and resume with value as result of last send.
Execute any unwind blocks while unwinding.
ASSUMES self is a sender of thisContext."
| context unwindBlock |
self isDead
ifTrue: [ self cannotReturn: value to: self ].
context := firstUnwindContext.
298
Handling Exceptions
This is the method where the argument block of ensure: is executed. This
method looks for all the unwind contexts between the context of the method
resume: and self, which is the sender of the on:do: context (in our case the
context of start). When the method finds an unwound context, the unwound
block is executed. Lastly, it triggers the terminateTo: message.
ContextPart>>terminateTo: previousContext
"Terminate all the Contexts between me and previousContext, if previousContext is on
my Context stack. Make previousContext my sender."
| currentContext sendingContext |
<primitive: 196>
(self hasSender: previousContext) ifTrue: [
currentContext := sender.
[currentContext == previousContext] whileFalse: [
sendingContext := currentContext sender.
currentContext terminate.
currentContext := sendingContext]].
sender := previousContext
Basically, this method terminates all the contexts between thisContext and
self, which is the sender of the on:do: context (in our case the context of start
). Moreover, the sender of thisContext will become self, which is the sender
of the on:do: context (in our case the context of start). It is implemented as a
primitive for performance only, so the primitive is optional and the fallback
code has the same behavior.
Lets summarize what happens with Figure 13.6 which represents the
execution of the method ensureWithOnDo defined previously.
Ensuring a non local return. The method resume:through: is also called when
performing a non local return. In the case of non local return, the stack is
unwound in a similar way than or exception. The virtual machine, while
performing a non local return, send the message aboutToReturn:through: to the
active context. Therefore, if one has changed the implementation of exception in the language,
Ensure:s implementation
299
Bexp>>ensureWithOnDo
^[[Error signal] ensure: [1].
^3] on: Error do: [2]
context 1
context 2
BlockClosure>>ensure: aBlock
| complete returnValue |
<primitive: 198>
returnValue := self valueNoContextSwitch.
complete ifNil: [
complete := true.
aBlock value.].
^ returnValue
context 3
Exception class>>signal
signalContext := thisContext contextTag.
signaler ifNil: [ signaler := self receiver ].
^ signalContext nextHandlerContext handleSignal: self
context 4
Bexp new
X
[[Error signal]
ensure: [1].^3]
[Error signal]
X
Error
ContextPart>>handleSignal: exception
context 5
| val |
context 2
((self exceptionClass handles: exception)
and: [self exceptionHandlerIsActive]) ifFalse: [
^ self nextHandlerContext handleSignal: exception].
exception privHandlerContext: self contextTag.
self exceptionHandlerIsActive: false.
val := [self exceptionHandlerBlock cull: exception]
ensure: [self exceptionHandlerIsActive: true].
self return: val.
ContextPart>>return: value
sender ifNil: [self cannotReturn: value to: sender].
sender resume: value
context 6
X
ContextPart>>resume: value
self resume: value through: (thisContext
findNextUnwindContextUpTo: self)
context 7
context 8
context 2
X
context 1
context 1
300
Handling Exceptions
Legend
-------------------------------------Sender of the context below------------------------------------Context method class >> Context method name
Context method body
context
number
receiver of
the context
method
13.18
Chapter summary
In this chapter we saw how to use exceptions to signal and handle abnormal
situations arising in our code.
Do not use exceptions as a control-flow mechanism. Reserve them for
notifications and for abnormal situations. Consider providing methods
that take blocks as arguments as an alternative to signaling exceptions.
Use protectedBlock ensure: actionBlock to ensure that actionBlock will be
performed even if protectedBlock terminates abnormally.
Use protectedBlock ifCurtailed: actionBlock to ensure that actionBlock will be
performed only if protectedBlock terminates abnormally.
Exceptions are objects. Exception classes form a hierarchy with the
class Exception at the root of the hierarchy.
Use protectedBlock on: ExceptionClass do: handlerBlock to catch exceptions
that are instances of ExceptionClass (or any of its subclasses). The handlerBlock should take an exception instance as its sole argument.
Exceptions are signaled by sending one of the messages signal or signal:.
signal: takes a descriptive string as its argument. The description of an
exception can be obtained by sending it the message description.
You can set a breakpoint in your code by inserting the message-send
self halt. This signals a resumable Halt exception, which, by default, will
open a debugger at the point where the breakpoint occurs.
When an exception is signaled, the runtime system will search up the
execution stack, looking for a handler for that specific class of exception. If none is found, the defaultAction for that exception will be performed (i.e., in most cases the debugger will be opened).
Chapter summary
301
Chapter 14
304
14.1
Basics
What is a block? A block is a lambda expression that captures (or closes over)
its environment at creation-time. We will see later what it means exactly. For
now, imagine a block as an anonymous function or method. A block is a
piece of code whose execution is frozen and can be kicked in using messages.
Blocks are defined by square brackets.
If you execute and print the result of the following code, you will not get
3, but a block. Indeed, you did not ask for the block value, but just for the
block itself, and you got it.
[1+2]
[1+2]
A block is evaluated by sending the value message to it. More precisely, blocks can be evaluated using value (when no argument is mandatory), value: (when the block requires one argument), value:value: (for two arguments), value:value:value: (for three) and valueWithArguments: anArray (for more
arguments). These messages are the basic and historical API for block evaluation. They were presented in the Pharo by Example book.
[ 1 + 2 ] value
3
[ :x | x + 2 ] value: 5
7
Basics
305
Other messages. Some messages are useful to profile evaluation (more information in the Chapter 17):
bench. Return how many times the receiver block can be evaluated in 5
seconds.
durationToRun. Answer the duration (instance of class Duration) taken to eval-
Some messages are related to error handling (as explained in the Chapter 13).
ensure: terminationBlock. Evaluate the termination block after evaluating the
is raised, fork a new process, which will handle the error. The original process will continue running as if the receiver evaluation finished
and answered nil,i.e., an expression like: [ self error: some error] on: Error fork: [:ex | 123 ] will always answer nil to the original process. The
context stack, starting from the context which sent this message to the
receiver and up to the top of the stack will be transferred to the forked
process, with the catch block on top. Eventually, the catch block will
be evaluated in the forked process.
Some messages are related to process scheduling. We list the most important ones. Since this Chapter is not about concurrent programming in Pharo,
we will not go deep into them.
fork. Create and schedule a Process evaluating the receiver.
forkAt: aPriority. Create and schedule a Process evaluating the receiver at the
scheduled.
306
14.2
A block can have its own temporary variables. Such variables are initialized
during each block execution and are local to the block. We will see later how
such variables are kept. Now the question we want to make clear is what is
happening when a block refers to other (non-local) variables. A block will
close over the external variables it uses. It means that even if the block is
executed later in an environment that does not lexically contain the variables
used by a block, the block will still have access to the variables during its
execution. Later, we will present how local variables are implemented and
stored using contexts.
In Pharo, private variables (such as self, instance variables, method temporaries and arguments) are lexically scoped: an expression in a method can
access to the variables visible from that method, but the same expression put
in another method or class cannot access the same variables because they are
not in the scope of the expression (i.e., visible from the expression).
At runtime, the variables that a block can access, are bound (get a value
associated to them) in the context in which the block that contains them is
defined, rather than the context in which the block is evaluated. It means that
a block, when evaluated somewhere else can access variables that were in
its scope (visible to the block) when the block was created. Traditionally, the
context in which a block is defined is named the block home context.
The block home context represents a particular point of execution (since
this is a program execution that created the block in the first place), therefore
this notion of block home context is represented by an object that represents
program execution: a context object in Pharo. In essence, a context (called
stack frame or activation record in other languages) represents information
about the current evaluation step such as the context from which the current
one is executed, the next byte code to be executed, and temporary variable
values. A context is a Pharo execution stack element. This is important and
we will come back later to this concept.
A block is created inside a context (an object that represents a point in the execution).
307
classVariableNames: ''
poolDictionaries: ''
category: 'BlockExperiment'
Experiment 1: Variable lookup. A variable is looked up in the block definition context. We define two methods: one that defines a variable t and sets
it to 42 and a block [t traceCr] and one that defines a new variable with the
same name and executes a block defined elsewhere.
Bexp>>setVariableAndDefineBlock
|t|
t := 42.
self evaluateBlock: [ t traceCr ]
Bexp>>evaluateBlock: aBlock
|t|
t := nil.
aBlock value
Bexp new setVariableAndDefineBlock
42
308
Figure 14.1: Non-local variables are looked up the method activation context
where the block was created and not where it is evaluated.
309
Bexp>>setVariableAndDefineBlock3
|t|
t := 42.
self evaluateBlock: [ t traceCr. t := 33. t traceCr ].
self evaluateBlock: [ t traceCr. t := 66. t traceCr ].
self evaluateBlock: [ t traceCr ]
Bexp new setVariableAndDefineBlock3
42
33
33
66
66
Bexp new setVariableAndDefineBlock3 will print 42, 33, 33, 66 and 66. Here the
two blocks [ t := 33. t traceCr ] and [ t := 66. t traceCr ] access the same variable t
and can modify it. During the first execution of the method evaluateBlock: its
current value 42 is printed, then the value is changed and printed. A similar
situation occurs with the second call. This example shows that blocks share
the location where variables are stored and also that a block does not copy
the value of a captured variable. It just refers to the location of the variables
and several blocks can refer to the same location.
Here the initial value of the variable t is 42. The block is created and
stored into the instance variable block but the value to t is changed to 69
before the block is evaluated. And this is the last value (69) that is effectively printed because it is looked up at execution-time. Executing Bexp new
setVariableAndDefineBlock4 prints 69.
Bexp>>setVariableAndDefineBlock4
|t|
t := 42.
block := [ t traceCr: t ].
t := 69.
self evaluateBlock: block
310
Experiment 5: For method arguments. We can expect that method arguments are bound in the context of the defining method. Lets illustrate this
point now. Define the following methods.
Bexp>>testArg
self testArg: 'foo'.
Bexp>>testArg: arg
block := [arg traceCr].
self evaluateBlockAndIgnoreArgument: 'zork'.
Bexp>>evaluateBlockAndIgnoreArgument: arg
block value.
Now executing Bexp new testArg: 'foo' prints 'foo' even if in the method
evaluateBlockAndIgnoreArgument: the temporary arg is redefined. In fact each
method invocation has its own values for the arguments.
Experiment 6: self binding. Now we may wonder if self is also captured.
To test we need another class. Lets simply define a new class and a couple of
methods. Add the instance variable x to the class Bexp and define the initialize
method as follows:
Object subclass: #Bexp
instanceVariableNames: 'block x'
classVariableNames: ''
poolDictionaries: ''
category: 'BlockExperiment'
Bexp>>initialize
super initialize.
x := 123.
311
x := 69.
Bexp2>>evaluateBlock: aBlock
aBlock value
Then define the methods that will invoke methods defined in Bexp2.
Bexp>>evaluateBlock: aBlock
Bexp2 new evaluateBlock: aBlock
Bexp>>evaluateBlock
self evaluateBlock: [self crTrace ; traceCr: x]
Bexp new evaluateBlock
a Bexp123 "and not a Bexp269"
Block-local variables
As we saw previously a block is a lexical closure that is connected to the
place where it is defined. In the following, we will illustrate this connection
by showing that block local variables are allocated in the execution context
link to their creation. We will show the difference when a variable is local to
a block or to a method (see Figure 14.2).
Block allocation. Implement the following method blockLocalTemp.
Bexp>>blockLocalTemp
| collection |
collection := OrderedCollection new.
#(1 2 3) do: [ :index |
| temp |
temp := index.
collection add: [ temp ] ].
^ collection collect: [ :each | each value ]
312
Lets comment the code: we create a loop that stores the current index
(an block argument) in a temporary variable temp created in the loop. We
then store a block that accesses this variable in a collection. After the loop,
we execute each accessing block and return the collection of values. If we
execute this method, we get a collection with 1, 2 and 3. This result shows
that each block in the collection refers to a different temp variable. This is due
to the fact that an execution context is created for each block creation (at each
loop step) and that the block [ temp ] is stored in this context.
Method allocation. Now let us create a new method that is the same as
blockLocalTemp except that the variable temp is a method variable instead of a
block variable.
Bexp>>blockOutsideTemp
| collection temp |
collection := OrderedCollection new.
#(1 2 3) do: [ :index |
temp := index.
collection add: [ temp ] ].
^ collection collect: [ :each | each value ]
14.3
313
When we execute Bexp new foo, we get 0 and not nil. What you see here is
that the value is shared between the method body and the block. Inside the
method body we can access the variable whose value was set by the block
evaluation. Both the method and block bodies access the same temporary
variable a.
Lets make it slightly more complicated. Define the method twoBlockArray
as follows:
Bexp>>twoBlockArray
|a|
a := 0.
^ {[ a := 2] . [a]}
314
You can also define the code as follows and open a transcript to see the
results.
| res |
res := Bexp new twoBlockArray.
res second value traceCr.
res first value.
res second value traceCr.
Let us step back and look at an important point. In the previous code
snippet when the expressions res second value and res first value are executed,
the method twoBlockArray has already finished its execution - as such it is not
on the execution stack anymore. Still the temporary variable a can be accessed and set to a new value. This experiment shows that the variables
referred to by a block may live longer than the method which created the
block that refers to them. We say that the variables outlive the execution of
their defining method.
You can see from this example that while temporary variables are somehow stored in an activation context, the implementation is a bit more subtle
than that. The block implementation needs to keep referenced variables in a
structure that is not in the execution stack but lives on the heap. The compiler
performs some analysis and when it detects that a variable may outlive its
creation context, it allocates the variables in a structure that is not allocated
on the execution stack.
14.4
In this section we explain why it is not a good idea to have return statements
inside a block (such as [^ 33]) that you pass or store into instance variables.
A block with an explicit return statement is called a non-local returning block.
Let us start illustrating some basic points first.
Basics on return
By default the returned value of a method is the receiver of the message
i.e., self. A return expression (the expression starting with the character ^)
allows one to return a different value than the receiver of the message. In
addition, the execution of a return statement exits the currently executed
method and returns to its caller. This ignores the expressions following the
return statement.
Experiment 7: Returns Exiting Behavior. Define the following method.
Executing Bexp new testExplicitReturn prints one and two but it will not print
315
not printed, since the method testExplicitReturn will have returned before.
Bexp>>testExplicitReturn
self traceCr: 'one'.
0 isZero ifTrue: [ self traceCr: 'two'. ^ self].
self traceCr: 'not printed'
Note that the return expression should be the last statement of a block
body.
For example, the following expression Bexp new jumpingOut will return 3
and not 42. ^ 42 will never be reached. The expression [ ^3 ] could be deeply
nested, its execution jumps out all the levels and return to the method caller.
Some old code (predating introduction of exceptions) passes non-local returning blocks around leading to complex flows and difficult to maintain
code. We strongly suggest not using this style because it leads to complex
code and bugs. In subsequent sections we will carefully look at where a
return is actually returning.
Understanding return
Now to see that a return is really escaping the current execution, let us build
a slightly more complex call flow. We define four methods among which
one (defineBlock) creates an escaping block, one (arg:) evaluates this block and
one (evaluatingBlock:) that executes the block. Note that to stress the escaping
behavior of a return we defined evaluatingBlock: so that it endlessly loops after
evaluating its argument.
Bexp>>start
| res |
316
Executing Bexp new start prints the following (indentation added to stress
the calling flow).
start start
defineBlock start
arg start
evaluateBlock start
block start
start end
What we see is that the calling method start is fully executed. The method
defineBlock is not completely executed. Indeed, its escaping block [^33] is executed two calls away in the method evaluateBlock:. The evaluation of the
block returns to the block home context sender (i.e., the context that invoked
the method creating the block).
When the return statement of the block is executed in the method
evaluateBlock:, the execution discards the pending computation and returns
to the method execution point that created the home context of the block. The
block is defined in the method defineBlock. The home context of the block is
the activation context that represents the definition of the method defineBlock.
317
Figure 14.3: A block with non-local return execution returns to the method
execution that activated the block home context. Frames represent contexts
and dashed frames represent the same block at different execution points.
Therefore the return expression returns to the start method execution just after the defineBlock execution. This is why the pending executions of arg: and
evaluateBlock: are discarded and why we see the execution of the method start
end.
As shown by Figure 14.3, [^33] will return to the sender of its home context. [^33] home context is the context that represents the execution of the
method defineBlock, therefore it will return its result to the method start.
Step 1 represents the execution up to the invocation of the method
defineBlock. The trace 'start start' is printed.
Step 3 represents the execution up to the block creation, which is done
in Step 2. 'defineBlock start' is printed. The home context of the block is
the defineBlock method execution context.
318
319
To verify where the execution will end, you can use the expression
thisContext home sender copy inspect. which returns a method context pointing
to the assignment in the method start.
Then we define a simple assert: method that raises an error if its argument
is false.
Bexp>>assert: aBoolean
aBoolean ifFalse: [Error signal]
320
Non-local return blocks. As a block is always evaluated in its home context, it is possible to attempt to return from a method execution which has
already returned. This runtime error condition is trapped by the VM.
Bexp>>returnBlock
^ [ ^ self ]
Bexp new returnBlock value
Exception
When we execute returnBlock, the method returns the block to its caller
(here the top level execution). When evaluating the block, because the
method defining it has already terminated and because the block is containing a return expression that should normally return to the sender of the block
home context, an error is signaled.
Conclusion. Blocks with non-local expressions ([^ ...]) return to the sender
of the block home context (the context representing the execution led to the
block creation).
14.5
We saw that blocks refer to the home context when looking for variables. So
now we will look at contexts. Contexts represent program execution. The
Pharo execution engine represents its current execution state with the following information:
1. the CompiledMethod whose bytecodes are being executed;
2. the location of the next bytecode to be executed in that CompiledMethod.
This is the interpreters program pointer;
3. the receiver and arguments of the message that invoked the
CompiledMethod;
4. any temporary variable needed by the CompiledMethod;
5. a call stack.
In Pharo, the class MethodContext represents this execution information.
A MethodContext instance holds information about a specific execution point.
The pseudo-variable thisContext gives access to the current execution point.
321
Figure 14.4: A method context where we can access the value of the temporary variable temp at that given point of execution.
You will get the inspector shown in Figure 14.4. Note that we copy the
current context obtained using thisContext because the Virtual Machine limits
memory consumption by reusing contexts.
MethodContext does not only represent activation context of method execution but also the ones for blocks. Let us have a look at some values of the
current context:
sender points to the previous context that led to the creation of the current one. Here when you executed the expression, a context was created and this context is the sender of the current one.
method points to the currently executing method.
pc holds a reference to the latest executed instruction. Here its value is
27. To see which instruction is referred to, double click on the method
instance variable and select the all bytecodes field, you should get the
situation depicted in Figure 14.5, which shows that the next instruction
to be executed is pop (instruction 28).
322
stackp defines the depth of the stack of variables in the context. In most
cases, its value is the number of stored temporary variables (including
arguments). But in certain cases, for example during a message send,
the depth of the stack is increased: the receiver is pushed, then the
arguments, lastly the message send is executed and the depth of the
stack goes back to its previous value.
closureOrNil holds a reference to the currently executing closure or nil.
receiver is the message receiver.
The class MethodContext and its superclasses define many methods to get
information about a particular context. For example, you can get the values of the arguments by sending the arguments message and the value of a
particular temporary variable by sending tempNamed:.
Figure 14.5: The pc variable holds 27 because the last (bytecode) instruction
executed was the message send inspect.
Message execution
323
Lets look at the following example. When you execute, just press "ok" to
the dialogs popping up.
| homeContext b1 |
homeContext := thisContext.
b1 := [| b2 |
self assert: thisContext closure == b1.
self assert: b1 outerContext == homeContext.
self assert: b1 home = homeContext.
b2 := [self assert: thisContext closure == b2.
self assert: b2 outerContext closure outerContext == homeContext].
self assert: b2 home = homeContext.
b2 value].
b1 value
14.6
Message execution
The Virtual Machine represents execution state as context objects, one per
method or block currently executed (the word activated is also used). In
Pharo, method and block executions are represented by MethodContext instances. In the rest of this chapter we survey contexts, method execution,
and block closure execution.
Sending a message
To send a message to a receiver, the VM has to:
1. Find the class of the receiver using the receiver objects header.
324
Sketch of implementation
Temporaries and arguments for blocks are handled the same way as in methods. Arguments are passed on the stack and temporaries are held in the
corresponding context. Nevertheless, a block can access more variables than
Message execution
325
a method: a block can refer to arguments and temporaries from the enclosing method. As we have seen before, blocks can be passed around freely
and activated at any time. In all cases, the block can access and modify the
variables from the method it was defined in.
Let us consider the example shown in Figure 14.7. The temp variable used
in the block of the exampleReadInBlock method is non-local or remote variable.
temp is initialized and changed in the method body and later on read in the
block. The actual value of the variable is not stored in the block context but
in the defining method context, also known as home context. In a typical
implementation the home context of a block is accessed through its closure.
This approach works well if all objects are first-class objects, including the
method and block context. Blocks can be evaluated outside their home context and still refer to remote variables. Hence all home contexts might outlive
the method activation.
326
point. Combined with the typical coding practice of using small methods
that call many other objects, Pharo can generate a lot of contexts.
The most efficient way to deal with method contexts is to not create them
at all. At the VM level, this is done by using real stack frames. Method contexts can be easily mapped to stack frames: whenever we call a method we
create a new frame, whenever we return from a method we delete the current
frame. In that matter Pharo is not very different from C. This means whenever we return from a method the method context (stack frame) is immediately removed. Hence no high-level garbage collection is needed. Nevertheless, using the stack gets much more complicated when we have to support
blocks.
As mentioned before, method contexts that are used as home contexts
might outlive their activation. If method contexts work as we explained up
to now we would have to check each time for home contexts if a stack frame
is removed. This comes with a big performance penalty. Hence the next step
in using a stack for contexts is to make sure method contexts can be safely
removed when we return from a method.
The Figure 14.8 shows how non-local variables are no longer directly
stored in the home context, but in a separate remote array which is heap
allocated.
14.7
Chapter conclusion
In this chapter we learned how to use blocks, also called lexical closures, and
how they are implemented. We saw that we can use a block even if the
Chapter conclusion
327
Figure 14.8: How the VM stores remote variables so that they continue to
leave when a method returns.
method defining it has returned. A block can access its own variables and
also non local variables: instance variables, temporaries and arguments of
the defining method. We also saw how blocks can terminate a method and
return a value to the sender. We say that these blocks are non-local returning blocks and that some care has to be taken to avoid errors: a block can
not terminate a method that has already returned. Finally, we show what
contexts are and how they play an important role with block creation and
execution. We show what the thisContext pseudo variable is and how to use it
to get information about the executing context and potentially change it.
We thank Eliot Miranda for the clarifications.
Chapter 15
15.1
Lets start with some simple math. In the digital world, information is encoded as powers of 2. Nothing really new. In Smalltalk raising a number to
a power is performed by sending the message raisedTo: to a number. Here are
some examples. Figure 15.1 shows the powers of 2.
2 raisedTo: 0
1
2 raisedTo: 2
4
2 raisedTo: 8
256
330
Figure 15.2: 13 = 1 23 + 1 22 + 0 21 + 1 20 .
Binary notation
Pharo has a syntax for representing numbers in different bases. We write
2r1101 where 2 indicates the base or radix, here 2, and the rest the number
expressed in this base. Note that we could also write 2r01101 or 2r0001101
since this notation follows the convention that the least significant bit is the
rightmost one.
2r1101
13
13 printStringBase: 2
'1101'
Integer readFrom: '1101' base: 2
13
Note that the last two messages printStringBase: and readFrom:base: do not
handle the internal encoding of negative numbers well as we will see later.
-2 printStringBase: 2 returns -10 but this is not the internal number representation (known as twos complement). These messages just print/read the
number in a given base.
The radix notation can be used to specify numbers in different bases. Obviously 15 written in decimal base (10r15) returns 15, while 15 in base 16
returns 16 + 5 = 21 as illustrated by the following expressions.
331
10r15
15
16r15
15.2
21
Since integers are represented as sequences of bits, if we shift all the bits
from a given amount we obtain another integer. Shifting bits is equivalent
to performing a multiplication/division by two. Figure 15.3 illustrates this
point. Smalltalk offers three messages to shift bits: >> aPositiveInteger, <<
aPositiveInteger and bitShift: anInteger. >> divides the receiver, while << multiply
it by a power of two.
The following examples show how to use them.
2r000001000
8
2r000001000 >> 1
"we divide by two"
4
(2r000001000 >> 1) printStringBase: 2
'100'
2r000001000 << 1
"we multiply by two"
16
The message bitShift: is equivalent to >> and <<, but it uses negative and
positive integers to indicate the shift direction. A positive argument offers
the same behavior as <<, multiplying the receiver by a power of 2. A negative
is similar to >>.
332
2r000001000
8
2r000001000 bitShift: -1
4
2r000001000 bitShift: 1
16
The previous examples only show bit shifting numbers with one or two
bits, but there is no constraint at this level. The complete sequence of bits
can be shifted as shown with 2r000001100 below and Figure 15.4.
(2 raisedTo: 8) + (2 raisedTo: 10)
1280
2r010100000000
1280
2r010100000000 >> 8
5
So far, there is nothing really special. Though you should have learned
this in a basic math lectures, it is always good to walk on a hill before climbing a mountain.
15.3
333
Pharo offers common boolean operations for bit manipulation. Hence you
can send the messages bitAnd:, bitOr:, and bitXor: to numbers. They will apply
bit by bit the associated Boolean operation.
2r000001101 bitAnd: 2r01
1
2r000001100 bitAnd: 2r01
0
2r000001101 bitAnd: 2r1111
1101
bitAnd: can then be used to select part of a number. For example, bitAnd: 2
r111 selects the three first bits.
2r000001101 bitAnd: 2r111
5
2r000001101 bitAnd: 2r0
0
2r0001001101 bitAnd: 2r1111
13
"1101"
2r000001101 bitAnd: 2r111000
8
"1000"
2r000101101 bitAnd: 2r111000
40
"101000"
334
Bit Access. Smalltalk lets you access bit information. The message bitAt: returns the value of the bit at a given position. It follows the Pharo convention
that collection indexes start at one.
2r000001101 bitAt: 1
1
2r000001101 bitAt: 2
0
2r000001101 bitAt: 3
1
2r000001101 bitAt: 4
1
2r000001101 bitAt: 5
0
With Pharo you can access the full environment and learn from the system itself. Here is the implementation of the method bitAt: on the Integer class.
Integer>>bitAt: anInteger
"Answer 1 if the bit at position anInteger is set to 1, 0 otherwise.
self is considered an infinite sequence of bits, so anInteger can be any strictly positive
integer.
Bit at position 1 is the least significant bit.
Negative numbers are in two-complements.
This is a naive implementation that can be refined in subclass for speed"
^ (self bitShift: 1 - anInteger) bitAnd: 1
We shift to the right from an integer minus one (hence 1 - anInteger) and
with a bitAnd: we know whether there is a one or zero in the location. Imagine
that we have 2r000001101, when we do 2r000001101 bitAt: 5 we will shift it from
4 and doing a bitAnd: 1 with select that bits (i.e., returns 1 if it was at 1 and zero
otherwise, so its value). Doing a bitAnd: 1 is equivalent to tell whether there
is a 1 in the least significant bit.
Again, nothing really special here, but this was to refresh our memories.
Now we will see how numbers are internally encoded in Pharo using 2s
complement. We will start by understanding the 10s complement and look
at 2s complement.
15.4
335
Some books propose another equivalent way of computing the 10s complement: (1) All the zeros at the right-hand end of the number remain as
zeros, (2) The rightmost non-zero digit d of the number is replaced by 10 d,
and (3) Each other digit d is replaced by 9 d.
Computer scientists will probably prefer the first way since it is more
regular and adding 1 is cheaper than making more tests.
Subtraction at work
The key point of complement techniques is to convert subtractions into additions. Let us check that.
Examples. Suppose we want to perform the subtraction 8 3 = 5. We will
transform such a subtraction into an addition using the 10s complement.
336
15.5
Negative numbers
To know the value of a positive number is simple: we just add all the powers
of 2 given by the binary representation as explained at the beginning of this
chapter. Getting the value of a negative number is quite simple: we do the
same except that we count the sign bit as negative and all the other ones as
positive. The sign bit is the most significant bit i.e., the bit that represents the
largest value (see Figure 15.5). For example, on 8 bit representation it will be
the one associated with the weight 27 .
Let us illustrate that: 2 is represented on 8 bit encoding as: 1111 1110.
337
15.6
Now we have all the pieces of the puzzle: we know how we can encode
positive and negative numbers, we know how to use the complement to
turn a subtraction into an addition. Let us see how the 2s complement is
used to negate numbers and perform subtraction.
The 2s complement is a common method to represent signed integers.
The advantages are that addition and subtraction are implemented without
having to check the sign of the operands and 2s complement has only one
representation for zero (avoiding negative zero). Adding numbers of different sign encoded using 2s complement does not require any special processing: the sign of the result is determined automatically. The 2s complement
of a positive number represents the negative form of that number.
338
339
'0000000000000000000000000000010'
---------110001
340
'0000000000000000000000000110001'
(2r110110 bitString)
'0000000000000000000000000110110'
2r101 bitString
'0000000000000000000000000000101'
2r101 negated bitString
'1111111111111111111111111111011'
The case where the result is a negative number is also well handled. For
example, if we want to compute 15 35, we should get -20 and this is what
we get. Let us see that: 15 is encoded as 0000 1111 and 35 as 0010 0011. Now
the twos complement of 35 is 1101 1101.
0011111 (carry)
0000 1111
1101 1101
----------------1111111101100
15.7
SmallIntegers in Pharo
SmallIntegers in Pharo
341
-1073741824 class
SmallInteger
2 class maxVal
returns 1073741823
-1 * (2 raisedTo: (31-1))
-1073741824
(2 raisedTo: 30) - 1
1073741823
(2 raisedTo: 30) - 1 = SmallInteger maxVal
342
15.8
true
Hexadecimal
We cannot finish this chapter without talking about hexadecimal. Pharo uses
the same syntax for hexadecimal than for binary. 16rF indicates that F is
encoded in 16 base.
We can get the hexadecimal equivalent of a number using the message
hex. Using the message printStringHex we get the number printed in hexadeci-
'16rF'
15 printStringHex
'F'
16rF printIt
15
The following snippet lists some equivalence between a number and its
hexadecimal representation.
{(1->'16r1'). (2->'16r2'). (3->'16r3'). (4->'16r4'). (5->'16r5'). (6->'16r6'). (7->'16r7').
(8->'16r8'). (9->'16r9'). (10->'16rA'). (11->'16rB'). (12->'16rC'). (13->'16rD'). (14
->'16rE'). (15->'16rF')}
15.9
Chapter summary
Pharo uses 2s complement encoding for its internal small integer representation and supports bit manipulation of their internal representation. This
Chapter summary
343
Chapter 16
16.1
The first basic principle is to never compare float equality. Lets take a simple
case: the addition of two floats may not be equal to the float representing
their sum. For example 0.1 + 0.2 is not equal to 0.3.
(0.1 + 0.2) = 0.3
false
Hey, this is unexpected, you did not learn that in school, did you? This
behavior is surprising indeed, but its normal since floats are inexact numbers. What is important to understand is that the way floats are printed is
also influencing our understanding. Some approaches print a simpler representation of reality than others. In early versions of Pharo printing 0.1 + 0.2
were printing 0.3, now it prints 0.30000000000000004. This change was guided
by the idea that it is better not to lie to the user. Showing the inexactness of
a float is better than hiding it because one day or another we can be deeply
bitten by them.
346
The method storeString also conveys that we are in presence of two different numbers.
(0.1 + 0.2) storeString
'0.30000000000000004'
0.3 storeString
'0.3'
About closeTo:. One way to know if two floats are probably close enough
to look like the same number is to use the message closeTo:
(0.1 + 0.2) closeTo: 0.3
true
0.3 closeTo: (0.1 + 0.2)
true
The method closeTo: verify that the two compared numbers have less than
0.0001 of difference. Here is its source code.
closeTo: num
"are these two numbers close?"
num isNumber ifFalse: [^[self = num] ifError: [false]].
self = 0.0 ifTrue: [^num abs < 0.0001].
num = 0 ifTrue: [^self abs < 0.0001].
^self = num asFloat
or: [(self - num) abs / (self abs max: num abs) < 0.0001]
Dissecting a Float
347
Now, if you execute the following line, you will see that the expressions
are not equals.
(0.1 asScaledDecimal: 2) + (0.2 asScaledDecimal: 2) = (0.3 asScaledDecimal: 2)
false
16.2
Dissecting a Float
0
1
1
0
0
1
0
+
+ 3 + 4 + 5 + 6 + + 52
2 22
2
2
2
2
2
The mantissa value is thus between 1 (included) and 2 (excluded) for normal
numbers.
348
Dissecting a Float
349
Float precision.
53
You can also retrieve the exact fraction corresponding to the internal representation of the Float:
11.125 asTrueFraction.
(89/8)
(#(0 2 3 6) detectSum: [:i | (2 raisedTo: i) reciprocal]) * (2 raisedTo: 3).
(89/8)
Until there weve retrieved the exact input weve injected into the Float.
Are Float operations exact after all? Hem, no, we only played with fractions
having a power of 2 as denominator and a few bits in numerator. If one of
these conditions is not met, we wont find any exact Float representation of
our numbers. For example, it is not possible to represent 1/5 with a finite
number of binary digits. Consequently, a decimal fraction like 0.1 cannot be
represented exactly with above representation.
(1/5) asFloat = (1/5).
false
(1/5) = 0.2
false
Let us see in detail how we could get the fractional bits of 1/5 i.e., 2r1/2r101.
For that, we must lay out the division:
1
10
100
1000
-101
11
110
-101
1
10
100
1000
-101
11
110
-101
1
101
0.00110011
350
Thats the bit pattern we expected, except the last bits 001 have been
rounded to upper 010. This is the default rounding mode of Float, round
to nearest even. We now understand why 0.2 is represented inexactly in machine. Its the same mantissa for 0.1, and its exponent is -4.
0.2 significand
1.6
0.1 significand
1.6
0.2 exponent
-3
0.1 exponent
-4
So, when we entered 0.1 + 0.2, we didnt get exactly (1/10) + (1/5). Instead of that we got:
0.1 asTrueFraction + 0.2 asTrueFraction.
(10808639105689191/36028797018963968)
But thats not all the story... Let us inspect the bit pattern of above fraction, and check the span of this bit pattern, that is the position of highest bit
set to 1 (leftmost) and position of lowest bit set to 1 (rightmost):
10808639105689191 printStringBase: 2.
'100110011001100110011001100110011001100110011001100111'
10808639105689191 highBit.
54
10808639105689191 lowBit.
1
36028797018963968 printStringBase: 2.
'10000000000000000000000000000000000000000000000000000000'
351
16.3
One of the biggest trap we learned with above example is that despite
the fact that 0.1 is printed '0.1' as if it were exact, its not. The name
absPrintExactlyOn:base: used internally by printString is a bit confusing, it does
not print exactly, but it prints the shortest decimal representation that will
be rounded to the same Float when read back (Pharo always converts the
decimal representation to the nearest Float).
Another
message
exists
to
This means that the fraction denominator is 255 and that you need 55
decimal digits after the decimal point to really print internal representation
of 0.1 exactly.
352
16.4
While float equality is known to be evil, you have to pay attention to other
aspects of floats. Let us illustrate that point with the following example.
2.8 truncateTo: 0.01
2.8000000000000003
2.8 roundTo: 0.01
2.8000000000000003
It is surprising but not false that 2.8 truncateTo: 0.01 does not return 2.8 but
2.8000000000000003. This is because truncateTo: and roundTo: perform several
operations on floats: inexact operations on inexact numbers can lead to cumulative rounding errors as you saw above, and thats just what happens
again.
Even if you perform the operations exactly and then round to nearest
Float, the result is inexact because of the initial inexact representation of 2.8
and 0.01.
(2.8 asTrueFraction roundTo: 0.01 asTrueFraction) asFloat
2.8000000000000003
Using 0.01s2 rather than 0.01 let this example appear to work:
2.80 truncateTo: 0.01s2
2.80s2
353
But its just a case of luck, the fact that 2.8 is inexact is enough to cause
other surprises as illustrated below:
2.8 truncateTo: 0.001s3.
2.799s3
2.8 < 2.800s3.
true
16.5
To add a nail to the coffin, lets play a bit more with inexact representations.
Let us try to see the difference between different numbers:
{
((2.8 asTrueFraction roundTo: 0.01 asTrueFraction) - (2.8 predecessor)) abs -> -1.
((2.8 asTrueFraction roundTo: 0.01 asTrueFraction) - (2.8)) abs -> 0.
((2.8 asTrueFraction roundTo: 0.01 asTrueFraction) - (2.8 successor)) abs -> 1.
} detectMin: [:e | e key ]
0.0->1
(2.8
If you want to know how far it is, then get an idea with:
((2.8 asTrueFraction roundTo: 0.01 asTrueFraction) - (2.8 successor asTrueFraction))
asFloat
-2.0816681711721685e-16
354
16.6
Chapter summary
Part V
Tools
Chapter 17
Profiling Applications
Since the beginning of software engineering, programmers have faced
issues related to application performance. Although there has been a great
improvement on the programming environment to support better and faster
development process, addressing performance issues when programming
still requires quite some dexterity.
In principle, optimizing an application is not particularly difficult. The
general idea is to make slow and frequently called methods either faster or
less frequently called. Note that optimizing an application usually complexifies the application. It is therefore recommended to optimize an application
only when the requirements for it are well understood and addressed. In
other term, you should optimize your application only when you are sure of
what it is supposed to do. As Kent Beck famously formulated: 1 - Make It
Work, 2 - Make It Right, 3 - Make It Fast.
17.1
Profiling an application is a term commonly employed that refers to obtaining dynamic information from a controlled program execution. The obtained
information is intended to provide important hints on how to improve the
program execution. These hints are usually numerical measurements, easily
comparable from one program execution to another.
In this chapter, we will consider measurement related to method execution time and memory consumption. Note that other kind of information
may be extracted from a program execution, in particular the method call
graph.
It is interesting to observe that a program execution usually follows the
358
Profiling Applications
universal 80-20 rule: only a few amount of the total amount of methods (lets
say 20%) consume the largest part of the available resources (80% of memory
and CPU consumption). Optimizing an application is essentially a matter of
tradeoff therefore. In this chapter we will see how to use the available tools
to quickly identify these 20% of methods and how to measure the progress
coming along the program enhancements we bring.
Experience shows that having unit tests is essential to ensure that we do
not break the program semantics when optimizing it. When replacing an
algorithm by another, we ought to make sure that the program still do what
it is supposed to do.
17.2
A simple example
Although the difference between these two executions is only about few
hundred of milliseconds, opting for one method instead of the other could
significantly slow your application!
359
Lets scrutinize the definition of select:thenCollect:. A naive and nonoptimized implementation is found in Collection. (Remember that Collection
is the root class of the Pharo collection library). A more efficient implementation is defined in OrderedCollection, which takes into account the structure of
an ordered collection to efficiently perform this operation.
Collection>>select: selectBlock thenCollect: collectBlock
"Utility method to improve readability."
^ (self select: selectBlock) collect: collectBlock
As you have probably guessed already, other collections such as Set and
Dictionary do not benefit from an optimized version. We leave as an exercise an efficient implementation for other abstract data types. As part of the
community effort, do not forget to submit your contribution to Pharo if you
come up with an optimized and better version of select:thenCollect: or other
methods. The Pharo team really value such effort.
The method bench. When sent to a block, the bench message estimates how
many times this block is evaluated per second. For example, the expression [
1000 factorial ] bench says that 1000 factorial may be executed approximately 350
times per second.
17.3
360
Profiling Applications
MessageTally
MessageTally is implemented as a unique class having the same name. Using it is quite simple. A message spyOn: needs to be sent to MessageTally
with a block expression as argument to obtained a detailed execution analysis. Evaluating MessageTally spyOn: ["your expression here"] opens a window that
contains the following information:
1. a hierarchy list showing the methods executed with their associated
execution time during the expression execution.
2. leaf methods of the execution. A leaf method is a method that does not
invoke other methods (e.g., primitive, accessors).
3. statistic about the memory consumption and garbage collector involvement.
Each of these points will be described later on.
361
Figure 17.1 shows the result of the expression MessageTally spyOn: [20
timesRepeat: [Transcript show: 1000 factorial printString]]. The message spyOn: executes the provided block in a new process. The analysis focuses on one process, only, the one that executes the block to profile. The message spyAllOn:
profiles all the processes that are active during the execution. This is useful
to analyze the distribution of the computation over several processes.
A tool a bit less crude than MessageTally is TimeProfileBrowser. It shows
the implementation of the executed method in addition (Figure 17.2).
TimeProfileBrowser understand the message spyOn:. It means that in the below source code, MessageTally can be replaced with TimeProfileBrowser to
obtain the better user interface.
362
Profiling Applications
(Figure 17.3). Start profiling all Processes creates a block from a text selection
and invokes spyAllOn:. The entry Start profiling UI profiles the user interface
process. This is quite handy when debugging a user interface!
Via the Test Runner. As the size of an application grows, unit tests are
usually becoming a good candidate for code profiling. Running tests often
is rather tedious when the time to run them is getting too long. The Test
Runner in Pharo offers a button Run Profiled (Figure 17.4).
Pressing this button runs the selected unit tests and generates a message
tally report.
17.4
MessageTally spyOn:
[ 500 timesRepeat: [
| str |
str := ''.
9000 timesRepeat: [ str := str, 'A' ]]].
363
364
Profiling Applications
**Memory**
old
+0 bytes
young
+9,536 bytes
used
+9,536 bytes
free
-9,536 bytes
**GCs**
full
0 totalling 0ms (0.0% uptime)
incr
9707 totalling 7,985ms (16.0% uptime), avg 1.0ms
tenures 0
root table 0 overflows
The first line gives the overall execution time and the number of samplings (also called tallies, we will come back on sampling at the end of the
chapter).
This tree shows that the interpreter spent 29.7% of its time by executing primitives. 11.5% of the total execution time is spent in the method
SequenceableCollection>>copyReplaceFrom:to:with:. This method is called when
concatenating character strings using the message comma (,), itself indirectly
invoking new: and some virtual machine primitives.
The execution takes 11.5% of the execution time, this means that the interpreter effort is shared with other processes. The invocation chain from the
code to the primitives is relatively short. Reaching hundreds of nested calls
is no exception for most of applications. We will optimize this example later
on.
365
**Memory**
The statistical part on memory consumption tells the observed changes on
the quantity of memory allocated and the garbage collector usage. To fully
understand this information, one needs to keep in mind that Pharos garbage
collector (GC) is a scavenging GC, relying on the principle that an old object
has greater change to live even longer. It is designed following the fact that
an old object will probably be kept referenced in the future. On the contrary,
a young object has greater change to be quickly dereferenced.
Several memory zones are considered and the migration of a young object to the space dedicated for old object is qualified as tenured. (Following
the metaphor of American academic scientists, when a permanent position
is obtained.)
An example of the memory analyze realized by MessageTally:
**Memory**
old
+0 bytes
young
+9,536 bytes
used
+9,536 bytes
free
-9,536 bytes
MessageTally describes the memory usage using four values:
1. the old value is about the grow of the memory space dedicated to old
objects. An object is qualified as old when its physical memory location is in the old memory space. This happens when a full garbage
366
Profiling Applications
collector is triggered, or when there are too many object survivors (according to some threshold specified in the virtual machine). This memory space is cleaned by a full garbage collection only. (An incremental
GC does not reduce its size therefore).
An increase of the old memory space is likely to be due to a memory leak:
the virtual machine is unable to release memory, promoting young objects as old.
2. the young value tells about the increase of the memory space dedicated
to young objects. When an object is created, it is physically located in
this memory space. The size of this memory space changes frequently.
3. the used value is the total amount of used memory.
4. the free value is the remaining amount of memory available.
In our example, none of the objects created during the execution have
been promoted as old. 9 536 bytes are used by the current process, located
in the young memory space. The amount of available memory has been
reduced accordingly.
**GCs**
The **GCs** provides statistics about the garbage collector. An example of a
garbage collector report is:
**GCs**
full
0 totalling 0ms (0.0% uptime)
incr
9707 totalling 7,985ms (16.0% uptime), avg 1.0ms
tenures 1 (avg 9707 GCs/tenure)
root table 0 overflows
Illustrative analysis
367
17.5
Illustrative analysis
Understanding the result obtained when profiling is the very first step when
one wants to optimize an application. However, as you probably started to
feel, understanding why a computation is costly is not trivial. Based on a
number of examples, we will see how comparing different profiling results
greatly helps to identify costly message calls.
The method "," is known to be slow since it creates a new character string
and copy both the receiver and the argument into it. Using a Stream is a significant faster approach to concatenate character strings. However, nextPut:
and nextPutAll: must be carefully employed!
Using a Stream for string concatenation. At the first glance, one could
think that creating a stream is costly since it is frequently used with relatively
slow inputs and outputs (e.g., network socket, disk accesses, Transcript). But
replacing the string concatenation employed in the previous example by a
stream operation is almost 10 times faster! This is easily understandable
since concatenating 9000 times a character strings creates 8999 intermediately objects, each being filled with the content of another. Using a stream,
we simply have to append a character at each iteration.
MessageTally spyOn:
[ 500 timesRepeat: [
| str |
str := WriteStream on: (String new).
9000 timesRepeat: [ str nextPut: $A ]]].
368
Profiling Applications
-------------------------------**Leaves**
33.0% {266ms} SmallInteger(Integer)>>timesRepeat:
21.2% {171ms} UndefinedObject>>DoIt
**Memory**
old
+0 bytes
young
-18,272 bytes
used
-18,272 bytes
free
+18,272 bytes
**GCs**
full
0 totalling 0ms (0.0% uptime)
incr
5 totalling 7ms (3.0% uptime), avg 1.0ms
tenures 0
root table 0 overflows
For this example, it is possible to improve the script by using the method
atAllPut:. The script below takes only a couple of milliseconds.
MessageTally spyOn:
[ 500 timesRepeat: [
| str |
str :=String new: 9000.
str atAllPut: $A ]].
Counting messages
369
valuable. The time taken with 9000 iterations is 2.7 times slower than with
500. Using the string concatenation (i.e., using the , method) instead of a
stream widens the gap with a factor 10. This experiment clearly illustrates
the importance of using appropriate tools to concatenate strings.
The time of the profiled execution is also an important quality factor for
the result. MessageTally employs a sampling technique to profile code. Per
default, MessageTally samples the current executing thread each millisecond
per default. It is therefore necessary that all the methods involved in the
computation are executed a fair amount of time to appear in the result
report. If the application to profile is very short (few milliseconds only), then
executing it a number of times help improving the accuracy of the report.
17.6
Counting messages
The downside of tallySend: is the time taken to execute the provided block.
The block to profile is executed by an interpreter written in Pharo, which
is slower then the one of the virtual machine. A piece of code profiled by
tallySends: is about 200 times slower. The interpreter is available from the
method ContextPartrunSimulated: aBlock contextAtEachStep: block2.
17.7
Memorized Fibonacci
370
Profiling Applications
IntegerfibSlow
self assert: self >= 0.
(self <= 1) ifTrue: [ ^ self].
^ (self - 1) fibSlow + (self - 2) fibSlow
The method fibSlow is relatively inefficient. Each recursion implies a duplication of the computation. The same result is computed twice, by each
branch of the recursion.
A more efficient (but also slightly more complicated) version is obtained
by using a cache that keeps intermediary computed values. The advantage
is to not duplicate computations since each value is computed once. This
classical way of optimizing program is called memoizing.
Integerfib
371
17.8
It is often important to know the amount of instances and the memory consumption of a given class. The class SpaceTally offers this functionality.
The expression SpaceTally new printSpaceAnalysis runs over all the classes
of the system and gathers for each of them its code size, the amount of instances and the total memory space taken by the instances. The result is
sorted along the total memory space taken by instances and is stored in a file
named STspace.text, located next to the Pharo image.
It is not surprising to see that strings, compiled methods and bitmaps
represents the largest part of the Pharo memory. The proportion of the compiled code, string and bitmap may be found in other platforms for diverse
applications.
SpaceTallys output is structured as follows:
Class
ByteString
Bitmap
CompiledMethod
Array
ByteSymbol
...
inst space
9133154
6122156
3307151
3071680
914367
Each line represents the memory analysis of a Pharo class. Classes are
ordered along the space they occupy. The class ByteString describes strings. It
is frequent to have strings to consume one third of the memory. Code space
gives the amount of bytes used by the class and its metaclass. It does not
include the space used by class variables. The value is given by the method
Behavior>>spaceUsed.
372
Profiling Applications
17.9
Few advices
17.10
MessageTally is a gorgeous example on how to use Pharos reflecting capabilities. The method spyEvery: millisecs on: aBlock contains the whole profiling
logic. This method is indirectly called by spyOn:. The millisecs value is the
amount of milliseconds between each sample. It is set at 1 per default. The
block to be profiled is aBlock.
Chapter summary
373
The essence of the profiling activity is given by the following code excerpt:
observedProcess := Processor activeProcess.
Timer := [
[ true ] whileTrue: [
| startTime |
startTime := Time millisecondClockValue.
myDelay wait.
self
tally: Processor preemptedProcess suspendedContext
in: (observedProcess == Processor preemptedProcess
ifTrue: [ observedProcess ] ifFalse: [ nil ])
by: (Time millisecondClockValue - startTime) // millisecs ].
nil] newProcess.
Timer priority: Processor timingPriority-1.
Timer is a new process, set at a high priority, that is in charge of monitoring
aBlock. The process scheduler will therefore favorably active it (timingPriority is
the process priority of system processes). It creates an infinite loop that waits
for the amount of necessary milliseconds (myDelay) before snapshooting the
method call stack. The process to observe is observedProcess. It is the process
in which the message spyEvery: millisecs on: aBlock has been sent.
The idea of profiling is to associate to each method context a counter. This
association is realized with an instance of the class MessageTally (the class
defines the variables class, method and process).
At a regular interval (myDelay), the counter of each stack frame is incremented with the amount of elapsed milliseconds. The stack frame is
obtained by sending suspendedContext to the process that has just been preempted.
The method tally: context in: aProcess by: count increments each stack frame
by the amount of milliseconds given by count.
The memory statistic are given by differentiating the amount of consumed memory, before and after the profiling. Smalltalk, an instance of the
class SmalltalkImage, contains many accessing methods to query the amount
of available memory.
17.11
Chapter summary
In this chapter, we see the basic of profiling in Pharo. It has presented the
functionalities of MessageTally and introduced a number of principles for resorbing performance bottleneck.
The method timeToRun and bench offer simple benchmarking and
374
Profiling Applications
Chapter 18
PetitParser: Building
Modular Parsers
with the participation of:
Jan Kurs (kurs@iam.unibe.ch)
Guillaume Larcheveque (guillaume.larcheveque@gmail.com)
Lukas Renggli (renggli@gmail.com)
Building parsers to analyze and transform data is a common task in software development. In this chapter we present a powerful parser framework
called PetitParser. PetitParser combines many ideas from various parsing
technologies to model grammars and parsers as objects that can be reconfigured dynamically. PetitParser was written by Lukas Renggli as part of his
work on the Helvetia system 1 but it can be used as a standalone library.
18.1
376
Loading PetitParser
Enough talking, lets get started. PetitParser is developed in Pharo, and there
are also versions for Java and Dart available. A ready made image can be
downloaded2 . To load PetitParser into an existing image evaluate the following Gofer expression:
Script 18.1: Installing PetitParser
Gofer new
smalltalkhubUser: 'Moose' project: 'PetitParser';
package: 'ConfigurationOfPetitParser';
load.
(Smalltalk at: #ConfigurationOfPetitParser) perform: #loadDefault.
identifier:
377
letter
word
Figure 18.1: Syntax diagram representation for the identifier parser defined
in script 18.2
A graphical notation
Figure 18.1 presents a syntax diagram of the identifier parser. Each box represents a parser. The arrows between the boxes represent the flow in which
input is consumed. The rounded boxes are elementary parsers (terminals).
The squared boxes (not shown on this figure) are parsers composed of other
parsers (non terminals).
If you inspect the object identifier of the previous script, youll notice that
it is an instance of a PPSequenceParser. If you dive further into the object you
will notice the following tree of different parser objects:
Script 18.3: Composition of parsers used for the identifier parser
PPSequenceParser (accepts a sequence of parsers)
PPPredicateObjectParser (accepts a single letter)
PPPossessiveRepeatingParser (accepts zero or more instances of another parser)
PPPredicateObjectParser (accepts a single word character)
The root parser is a sequence parser because the , (comma) operator creates a sequence of (1) a letter parser and (2) zero or more word character
parser. The root parser first child is a predicate object parser created by the
#letter asParser expression. This parser is capable of parsing a single letter
as defined by the CharacterisLetter method. The second child is a repeating
parser created by the star call. This parser uses its child parser (another predicate object parser) as much as possible on the input (i.e., it is a greedy parser).
Its child parser is a predicate object parser created by the #word asParser expression. This parser is capable of parsing a single digit or letter as defined
by the CharacterisDigit and CharacterisLetter methods.
378
If you are only interested if a given string (or stream) matches or not you
can use the following constructs:
Script 18.6: Checking that some inputs are identifiers
identifier matches: 'foo'.
true
identifier matches: '123'.
false
identifier matches: 'foo()'.
true
The last result can be surprising: indeed, a parenthesis is neither a digit
nor a letter as was specified by the #word asParser expression. In fact, the
identifier parser matches foo and this is enough for the PPParsermatches:
call to return true. The result would be similar with the use of parse: which
would return #($f #($o $o)).
If you want to be sure that the complete input is matched, use the message PPParserend as follows:
Script 18.7: Ensuring that the whole input is matched using PPParserend
false
379
The PPParserend message creates a new parser that matches the end of
input. To be able to compose parsers easily, it is important that parsers do not
match the end of input by default. Because of this, you might be interested
to find all the places that a parser can match using the message PPParser
matchesSkipIn: and PPParsermatchesIn:.
Script 18.8: Finding all matches in an input
identifier matchesSkipIn: 'foo 123 bar12'.
an OrderedCollection(#($f #($o $o)) #($b #($a $r $1 $2)))
identifier matchesIn: 'foo 123 bar12'.
an OrderedCollection(#($f #($o $o)) #($o #($o)) #($o #()) #($b #($a $r $1 $2))
#($a #($r $1 $2)) #($r #($1 $2)))
The PPParsermatchesSkipIn: method returns a collection of arrays containing what has been matched. This function avoids parsing the same character
twice. The method PPParsermatchesIn: does a similar job but returns a collection with all possible sub-parsed elements: e.g., evaluating identifier matchesIn:
'foo 123 bar12' returns a collection of 6 elements.
Similarly, to find all the matching ranges (index of first character and
index of last character) in the given input one can use either PPParser
matchingSkipRangesIn: or PPParsermatchingRangesIn: as shown by the script below:
Script 18.9: Finding all matched ranges in an input
identifier matchingSkipRangesIn: 'foo 123 bar12'.
an OrderedCollection((1 to: 3) (9 to: 13))
identifier matchingRangesIn: 'foo 123 bar12'.
an OrderedCollection((1 to: 3) (2 to: 3) (3 to: 3) (9 to: 13) (10 to: 13) (11 to: 13))
380
Terminal Parsers
Description
$a asParser
abc asParser
#any asParser
#digit asParser
#letter asParser
#word asParser
#blank asParser
#newline asParser
#space asParser
#tab asParser
#lowercase asParser
#uppercase asParser
nil asParser
Description
p1 , p2
p1 / p2
p star
p plus
p optional
p and
p negate
p not
p end
p times: n
p min: n max: m
p starLazy: q
identifier2:
381
letter
letter
digit
Figure 18.2: Syntax diagram representation for the identifier2 parser defined
in script 18.10
Parser action
To define an action or transformation on a parser we can use one of the messages PPParser==>, PPParserflatten, PPParsertoken and PPParsertrim defined
in the protocol Table 18.3.
Action Parsers
Description
p flatten
p token
p trim
p trim: trimParser
p ==> aBlock
'ajka0'
382
number:
digit
Figure 18.3: Syntax diagram representation for the number parser defined in
script 18.14
Script 18.12: Using PPParsertrim to ignore spaces
|identifier|
identifier := (#letter asParser , #word asParser star) flatten.
identifier parse: ' ajka '
letter expected at 0
identifier trim parse: ' ajka '
'ajka'
The table 18.3 shows the basic elements to build parsers. There are a
few more well documented and tested factory methods in the operators protocols of PPParser. If you want to know more about these factory methods,
browse these protocols. An interesting one is separatedBy: which answers a
new parser that parses the input one or more times, with separations specified by another parser.
383
above, the next step is to define the productions for addition and multiplication in order of precedence. Note that we instantiate the productions as
PPDelegateParser upfront, because they recursively refer to each other. The
method #setParser: then resolves this recursion. The following script defines
three parsers for the addition, multiplication and parenthesis (see Figure 18.4
for the related syntax diagram):
Script 18.15: Parsing arithmetic expressions
term := PPDelegateParser new.
prod := PPDelegateParser new.
prim := PPDelegateParser new.
term setParser: (prod , $+ asParser trim , term ==> [ :nodes | nodes first + nodes last ])
/ prod.
prod setParser: (prim , $* asParser trim , prod ==> [ :nodes | nodes first * nodes last ])
/ prim.
prim setParser: ($( asParser trim , term , $) asParser trim ==> [ :nodes | nodes second ])
/ number.
The term parser is defined as being either (1) a prod followed by +, followed by another term or (2) a prod. In case (1), an action block asks the
parser to compute the arithmetic addition of the value of the first node (a
prod) and the last node (a term). The prod parser is similar to the term
parser. The prim parser is interesting in that it accepts left and right parenthesis before and after a term and has an action block that simply ignores
them.
To understand the precedence of productions, see Figure 18.5. The root
of the tree in this figure (term), is the production that is tried first. A term is
either a + or a prod. The term production comes first because + as the lowest
priority in mathematics.
To make sure that our parser consumes all input we wrap it with the end
parser into the start production:
start := term end.
7
9
384
prod
term:
prod
prim
prod:
prim
prim:
term
number
Figure 18.4: Syntax diagram representation for the term, prod, and prim
parsers defined in script 18.15
term
prod
prim
parens
number
18.2
In the previous section we saw the basic principles of PetitParser and gave
some introductory examples. In this section we are going to present a way
to define more complicated grammars. We continue where we left off with
the arithmetic expression grammar.
Writing parsers as a script as we did previously can be cumbersome,
especially when grammar productions are mutually recursive and refer to
each other in complicated ways. Furthermore a grammar specified in a sin-
385
gle script makes it unnecessary hard to reuse specific parts of that grammar.
Luckily there is PPCompositeParser to the rescue.
Again we start with the grammar for an integer number. Define the
method number as follows:
Script 18.18: Implementing our first parser as a method
ExpressionGrammar>>number
^ #digit asParser plus flatten trim ==> [ :str | str asNumber ]
Every production in ExpressionGrammar is specified as a method that returns its parser. Similarly, we define the productions term, prod, mul, and prim.
Productions refer to each other by reading the respective instance variable
of the same name and PetitParser takes care of initializing these instance
variables for you automatically. We let Pharo automatically add the necessary instance variables as we refer to them for the first time. We obtain the
following class definition:
Script 18.19: Creating a class to hold our arithmetic expression grammar
PPCompositeParser subclass: #ExpressionGrammar
instanceVariableNames: 'add prod term mul prim parens number'
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
Script 18.20: Defining more expression grammar parsers, this time with no associated action
ExpressionGrammar>>term
^ add / prod
ExpressionGrammar>>add
^ prod , $+ asParser trim , term
386
ExpressionGrammar>>prod
^ mul / prim
ExpressionGrammar>>mul
^ prim , $* asParser trim , prod
ExpressionGrammar>>prim
^ parens / number
ExpressionGrammar>>parens
^ $( asParser trim , term , $) asParser trim
Contrary to our previous implementation we do not define the production actions yet (what we previously did by using PPParser==>); and we
factor out the parts for addition (add), multiplication (mul), and parenthesis
(parens) into separate productions. This will give us better reusability later
on. For example, a subclass may override such methods to produce slightly
different production output. Usually, production methods are categorized in
a protocol named grammar (which can be refined into more specific protocol
names when necessary such as grammar-literals).
Last but not least we define the starting point of the expression grammar.
This is done by overriding PPCompositeParserstart in the ExpressionGrammar
class:
Script 18.21: Defining the starting point of our expression grammar parser
ExpressionGrammar>>start
^ term end
Instantiating the ExpressionGrammar gives us an expression parser that returns a default abstract-syntax tree:
Script 18.22: Testing our parser on simple arithmetic expressions
parser := ExpressionGrammar new.
parser parse: '1 + 2 * 3'.
#(1 $+ #(2 $* 3))
parser parse: '(1 + 2) * 3'.
#(#($( #(1 $+ 2) $)) $* 3)
387
Script 18.23: Reusing the number parser from the ExpressionGrammar grammar
PPCompositeParser subclass: #MyNewGrammar
instanceVariableNames: 'number'
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
MyNewGrammar class>>dependencies
"Answer a collection of PPCompositeParser classes that this parser directly
dependends on."
^ {ExpressionGrammar}
MyNewGrammar>>number
"Answer the same parser as ExpressionGrammar>>number."
^ (self dependencyAt: ExpressionGrammar) number
Defining an evaluator
Now that we have defined a grammar we can reuse this definition to implement an evaluator. To do this we create a subclass of ExpressionGrammar called
ExpressionEvaluator.
Script 18.24: Separating the grammar from the evaluator by creating a subclass
ExpressionGrammar subclass: #ExpressionEvaluator
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
We then redefine the implementation of add, mul and parens with our evaluation semantics. This is accomplished by calling the super implementation
and adapting the returned parser as shown in the following methods.
Script 18.25: Refining the definition of some parsers to evaluate arithmetic expressions
ExpressionEvaluator>>add
^ super add ==> [ :nodes | nodes first + nodes last ]
ExpressionEvaluator>>mul
^ super mul ==> [ :nodes | nodes first * nodes last ]
ExpressionEvaluator>>parens
^ super parens ==> [ :nodes | nodes second ]
388
Defining a Pretty-Printer
We can reuse the grammar for example to define a simple pretty printer. This
is as easy as subclassing ExpressionGrammar again!
Script 18.27: Separating the grammar from the pretty printer by creating a subclass
ExpressionGrammar subclass: #ExpressionPrinter
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
ExpressionPrinter>>add
^ super add ==> [:nodes | nodes first , ' + ' , nodes third]
ExpressionPrinter>>mul
^ super mul ==> [:nodes | nodes first , ' * ' , nodes third]
ExpressionPrinter>>number
^ super number ==> [:num | num printString]
ExpressionPrinter>>parens
^ super parens ==> [:node | '(' , node second , ')']
This pretty printer can be tried out as shown by the following expressions.
Script 18.28: Testing our pretty printer on simple arithmetic expressions
parser := ExpressionPrinter new.
parser parse: '1+2 *3'.
'1 + 2 * 3'
'(1 + 2) * 3'
Testing a grammar
389
Script 18.30: Now our parser is also able to manage subtraction and division
(1/3)
18.3
Testing a grammar
It is then important that the test case class references the parser class:
this is done by overriding the PPCompositeParserTestparserClass method in
ExpressionGrammarTest:
390
These tests ensure that the ExpressionGrammar parser can parse some expressions using a specified production rule. Testing the evaluator and pretty
printer is similarly easy:
Script 18.34: Testing the evaluator and pretty printer
ExpressionGrammarTest subclass: #ExpressionEvaluatorTest
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
ExpressionEvaluatorTest>>parserClass
^ ExpressionEvaluator
ExpressionEvaluatorTest>>testAdd
super testAdd.
self assert: result equals: 200
ExpressionEvaluatorTest>>testNumber
super testNumber.
self assert: result equals: 123
ExpressionGrammarTest subclass: #ExpressionPrinterTest
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
ExpressionPrinterTest>>parserClass
^ ExpressionPrinter
ExpressionPrinterTest>>testAdd
super testAdd.
391
18.4
JSON consists of object definitions (between curly braces {}) and arrays
(between square brackets []). An object definition is a set of key/value
pairs whereas an array is a list of values. The previous JSON example then
represents an object (a person) with several key/value pairs (e.g., for the
persons first name, last name, and age). The address of the person is represented by another object while the phone number is represented by an array
of objects.
First we define a grammar as subclass of PPCompositeParser. Let us call it
PPJsonGrammar
392
object:
members
,
pair
members:
pair:
string
Token
value
Figure 18.6: Syntax diagram representation for the JSON object parser defined in script 18.37
classVariableNames: 'CharacterTable'
poolDictionaries: ''
category: 'PetitJson-Core'
The only new thing here is the call to the PPParserseparatedBy: convenience method which answers a new parser that parses the receiver (a value
here) one or more times, separated by its parameter parser (a comma here).
Arrays are much simpler to parse as depicted in the script 18.38.
Script 18.38: Defining the JSON parser for array as represented in Figure 18.7
PPJsonGrammar>>array
^ $[ asParser token trim ,
array:
393
elements
,
elements:
value
Figure 18.7: Syntax diagram representation for the JSON array parser defined in script 18.38
elements optional ,
$] asParser token trim
PPJsonGrammar>>elements
^ value separatedBy: $, asParser token trim
Parsing values
In JSON, a value is either a string, a number, an object, an array, a Boolean
(true or false), or null. The value parser is defined as below and represented
in Figure 18.8:
Script 18.39: Defining the JSON parser for value as represented in Figure 18.8
PPJsonGrammar>>value
^ stringToken / numberToken / object / array /
trueToken / falseToken / nullToken
A string requires quite some work to parse. A string starts and end with
double-quotes. What is inside these double-quotes is a sequence of characters. Any character can either be an escape character, an octal character, or a
normal character. An escape character is composed of a backslash immediately followed by a special character (e.g., '\n' to get a new line in the string).
An octal character is composed of a backslash, immediately followed by the
letter 'u', immediately followed by 4 hexadecimal digits. Finally, a normal
character is any character except a double quote (used to end the string) and
a backslash (used to introduce an escape character).
Script 18.40: Defining the JSON parser for string as represented in Figure 18.9
PPJsonGrammar>>stringToken
^ string token trim
PPJsonGrammar>>string
^ $" asParser , char star , $" asParser
PPJsonGrammar>>char
394
value:
string
Token
number
object
array
true
false
null
Figure 18.8: Syntax diagram representation for the JSON value parser defined in script 18.39
Special characters allowed after a slash and their meanings are defined in the CharacterTable dictionary that we initialize in the initialize class
method. Please note that initialize method on a class side is called when
the class is loaded into the system. If you just created the initialize method
class was loaded without the method. To execute it, you shoud evaluate
PPJsonGrammar initialize in your workspace.
Script 18.41: Defining the JSON special characters and their meaning
PPJsonGrammar class>>initialize
CharacterTable := Dictionary new.
CharacterTable
at: $\ put: $\;
at: $/ put: $/;
at: $" put: $";
at: $b put: Character backspace;
at: $f put: Character newPage;
at: $n put: Character lf;
string:
395
"
char
"
charEscape
charOctal
char:
charNormal
charEscape:
\ (backslash)
/ (slash)
b (backspace)
f (formfeed)
n (newline)
r (carr return)
t (tabulation)
charOctal:
\u
charNormal:
Figure 18.9: Syntax diagram representation for the JSON string parser defined in script 18.40
at: $r put: Character cr;
at: $t put: Character tab
396
number:
e / E
digit
1-9
digit
+ / -
digit
digit
Figure 18.10: Syntax diagram representation for the JSON number parser
defined in script 18.42
^ number token trim
PPJsonGrammar>>number
^ $- asParser optional ,
($0 asParser / #digit asParser plus) ,
($. asParser , #digit asParser plus) optional ,
(($e asParser / $E asParser) , ($- asParser / $+ asParser) optional , #digit asParser
plus) optional
The attentive reader will have noticed a small difference between the syntax diagram in Figure 18.10 and the code in script 18.42. Numbers in JSON
can not contain leading zeros: i.e., strings such as "01" do not represent valid
numbers. The syntax diagram makes that particularly explicit by allowing
either a 0 or a digit between 1 and 9. In the above code, the rule is made
implicit by relying on the fact that the parser combinator $/ is ordered: the
parser on the right of $/ is only tried if the parser on the left fails: thus, ($0
asParser / #digit asParser plus) defines numbers as being just a 0 or a sequence
of digits not starting with 0.
The other parsers are fairly trivial:
Script 18.43: Defining missing JSON parsers
PPJsonGrammar>>falseToken
^ 'false' asParser token trim
PPJsonGrammar>>nullToken
^ 'null' asParser token trim
PPJsonGrammar>>trueToken
^ 'true' asParser token trim
PetitParser Browser
397
18.5
PetitParser Browser
PetitParser is shipped with a powerful browser that can help to develop complex parsers. The PetitParser Browser provides graphical visualization, debugging support, refactoring support, and some other features discussed
later in this chapter. You will see that these features could be very useful
while developing your own parser. Pay attention to have Glamour already
loaded in your system. To load Glamour, see 10. Then to open the PetitParser
simply evaluate this expression:
Script 18.45: Opening PetitParser browser
PPBrowser open.
398
Example shows an automatically generated example based on the definition of the rule (see Figure 18.13 for an example for the prim rule). In
the top-right corner, the reload button generates a new example for the
same rule (see Figure 18.14 for another automatically generated example of the prim rule, this time with a parenthesized expression).
Figure 18.14: Another automatically generated example of the prim rule, after
having clicked the reload button. In this case, the prim example is a parenthesized expression.
First shows set of terminal parsers that can be activated directly after the
rule started. As you can see on Figure 18.15, the first set of prim is either
digit or opening parenthesis '('. This means that once you start parsing
prim the input should continue with either digit or '('.
One can use first set to double-check that the grammar is specified correctly. For example, if you see '+' in the first set of prim, there is something wrong with the definitions, because the prim rule was never ment
to start with binary operator.
PetitParser Browser
399
Terminal parser is a parser that does not delegate to any other parser.
Therefore you dont see parens in prim first set because parens delegates
to another parsers trimming and sequence parsers (see script 18.46).
You can see '(' which is first set of parens. The same states for number
rule which creates action parser delegating to trimming parser delegating to flattening parser delegating to repeating parser delegating to
#digit parser (see script 18.46). The #digit parser is terminal parser and
therefore you can see digit expected in a first set. In general, computation of first set could be complex and therefore PPBrowser computes
this information for us.
Follow shows set of terminal parsers that can be activated directly after the
rule finished. As you can see on Figure 18.16, the follow set of prim is
closing bracket character parser ')', star character parser '*', plus character parser '+' or epsilon parser (which states for empty string). In other
words, once you finished parsing prim rule the input should continue
with one of ')', '*', '+' characters or the input should be completely consumed.
One can use follow set to double-check that the grammar is specified
correctly. For example if you see '(' in prim follow set, something is
wrong in the definition of your grammar. The prim rule should be followed by binary operator or closing bracket, not by opening bracket.
In general, computation of follow could be even more complex than
computation of first and therefore PPBrowser computes this information
for us.
The lower-right side of the browser is related to a particular parsing input.
You can specify an input sample by filling in the text area in the Sample
400
tab. One may parse the input sample by clicking the play I button or by
pressing Cmd-s or Ctrl-s. You can then gain some insight on the parse result
by inspecting the tabs on the bottom-right pane:
Result shows the result of parsing the input sample that can be inspected by
clicking either the Inspect or Explore buttons. Figure Figure 18.17 shows
the result of parsing (1+2).
Debugger shows a tree view of the steps that were performed during parsing. This is very useful if you dont know what exactly is happening
during parsing. By selecting the step the subset of input is highlighted,
so you can see which part of input was parsed by a particular step.
For example, you can inspect how the ExpressionGrammar works, what
rules are called and in which order. This is depicted in Figure 18.18.
The grey rules are rules that failed. This usually happens for choice
parsers and you can see an example for the prod rule (the definition is
in script 18.47). When parser was parsing 12 + 3 4 term, the parser
tried to parse mul rule as a first option in prod. But mul required star
character '*' at position 2 which is not present, so that the mul failed and
instead the prim with value 12 was parsed.
Script 18.47: prod rule in ExpressionGrammar
ExpressionGrammar>>prod
^ mul / prim
ExpressionGrammar>>mul
^ prim, $* asParser trim, prod
PetitParser Browser
401
Tally shows how many times a particular parser got called during the parsing. The percentage shows the number of calls to total number of
calls ratio. This might be useful while optimizing performance of your
parser (see Figure 18.20).
Profile shows how much time was spent in particular parser during parsing of the input. The percentage shows the ratio of time to total time.
This might be useful while optimizing performance of your parser (see
Figure 18.21).
Progress visually shows how a parser consumes input. The x-axis represents how many characters were read in the input sample, ranging
from 0 (left margin) to the number of characters in the input (right
margin). The y-axis represents time, ranging from the beginning of the
parsing process (top margin) to its end (bottom margin). A line going
from top-left to bottom-right (such as the one in Figure 18.22) shows
402
that the parser completed its task by only reading each character of
the input sample once. This is the best case scenario, parsing is linear
in the length of the input: In another words, input of n characters is
parsed in n steps.
When multiple lines are visible, it means that the parser had to go back
to a previously read character in the input sample to try a different
rule. This can be seen in Figure 18.23. In this example, the parser had
to go back several times to correctly parse the whole input sample: all
input was parsed in n! steps which is very bad. If you see many backward jumps for a grammar, you should reconsider the order of choice
parsers, restructure your grammar or use a memoized parser. We will
have a detailed look on a backtracking issue in the following section.
PetitParser Browser
403
Figure 18.22: Progress of Petit Parser that parses input in linear amount of
steps.
Debugging example
As an exercise, we will try to improve a BacktrackingParser from script 18.48.
The BacktrackingParser was designed to accept input corresponding to the regular expressions 'a*b' and 'a*c'. The parser gives us correct results, but there
is a problem with performance. The BacktrackingParser does too much backtracking.
Script 18.48: A parser accepting 'a*b' and 'a*c' with too much backtracking.
PPCompositeParser subclass: #BacktrackingParser
instanceVariableNames: 'ab ap c p'
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
BacktrackingParser>>ab
^ 'b' asParser /
('a' asParser, ab)
BacktrackingParser>>c
404
^ 'c' asParser
BacktrackingParser>>p
^ ab / ap / c
BacktrackingParser>>start
^p
BacktrackingParser>>ap
^ 'a' asParser, p
PetitParser Browser
405
parsed in a similar way as the 'a*b' strings. You can see such a modification
in script 18.49.
Script 18.49: A slightly better parser accepting 'a*b' and 'a*c'.
PPCompositeParser subclass: #BacktrackingParser
instanceVariableNames: 'ab ac'
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
BacktrackingParser>>ab
406
^ 'b' asParser /
('a' asParser, ab)
BacktrackingParser>>ac
^ 'c' asParser /
('a' asParser, ac)
BacktrackingParser>>start
^ ab / ac
We can check the new metrics for inputc in both Figure 18.30 and Figure 18.31. There is significant improvement. For inputc , the tally shows only
20 invocations of the parser b and 9 invocations of the parser a. This is very
good improvement compared to the 110 invocations of the parser b and 55
PetitParser Browser
407
Figure 18.30: Progress of BacktrackingParser for inputc after the first update.
Figure 18.31: Tally of BacktrackingParser for inputc after the first update.
408
Figure 18.32: Progress of the BacktrackingParser after the second update for
inputc .
Figure 18.33: Tally of the BacktrackingParser after the second update for inputc .
Version
Original
First improvement
Second improvement
# of invocations
inputb inputc
28
28
46
233
70
48
Table 18.4: Number of parser invocations for inputb and inputc depending
on the version of BacktrackingParser.
BacktrackingParser>>abc
^ ('b' asParser / 'c' asParser) /
('a' asParser, abc)
BacktrackingParser>>start
^ abc
Packrat Parsers
18.6
409
Packrat Parsers
In the beginning of the chapter, we have mentioned four parser methodologies, one of them was Packrat Parsers. We claimed that packrat parsing gives
linear parse times. But in the debugging example we saw that original version of the BacktrackingParser parsed inputc of length 10 in 233 steps. And if
you try to parse longinputc = 'aaaaaaaaaaaaaaaaaaaac' (length 20), you will see
that the original parser needs 969 steps. Indeed, the progress is not linear.
The PetitParser framework does not use packrat parsing by default. You
need to send the memoized message to enable packrat parsing. The memoized parser ensures that the parsing for the particular position in an input
and the particular parser will be performed only once and the result will
be remembered in a dictionary for a future use. The second time the parser
wants to parse the input, the result will be looked up in the dictionary. This
way, a lot of unnecessary parsing can be avoided. The disadvantage is that
PetitParser needs much more memory to remember all the results of all the
possible parsers at all the possible positions.
To give you an example with a packrat parser, let us return back to the
BacktrackingParser once again (see script 18.48). As we have analyzed before,
the problem was in the parser ab that constantly failed in the p -> ab -> ap
-> p loop. Now we can do the trick and memoize the parser ab by updating
the method ab as in script 18.51. When the memoization is applied, we get
the progress as in Figure 18.34 with the total number of 63 invocations for
inputc and the 129 invocations for longinputc . With the minor modification
of BacktrackingParser we got a linear parsing time (related to the length of the
input) with a factor around 6.
Script 18.51: Memoized version of the parser ab.
BacktrackingParser>>ab
^ ( 'b' asParser /
('a' asParser, ab)
) memoized
18.7
Chapter summary
410
parsers
(and
create
grammar)
by
subclassing
PPCompositeParser.
4 http://www.themoosebook.org/book/internals/petit-parser
5 http://scg.unibe.ch/archive/phd/renggli-phd.pdf
Chapter 19
Biographies
Alexandre Bergel1 is Assistant Professor at the Department of Computer Science, Pleiad Laboratory, at the
University of Chile, in Santiago. Alexandre obtained
his PhD in 2005 from the University of Berne, Switzerland. His PhD has been awarded by the prestigious
Ernst-Denert prize in 2006. After his PhD, he completed
a first postdoc at Lero & Trinity College Dublin, Ireland,
and a second at the Hasso-Plattner Institute, Germany.
Alexandre and his collaborators carry out research in
software engineering and software quality, more specifically on code profiling, testing and data visualization. Alexandre has authored over 60 articles, published in international and peer reviewed scientific forums, including the most competitive conferences and journals in the
field of software engineering. Alexandre has participated to over 50 program
committees of international events. Alexandre has also a strong interest in
applying his research results to industry. Several of his research prototypes
have been turned into products.
Damien Cassou2 is an associate professor (matre de
confrences) at the University of Lille 1, France, and a
member of the RMoD research group (Inria, LIFL). The
main goal of his research is to solve problems faced
by developers everyday, from browsing complex source
code to semi-automatically decomposing large commits.
Before joining RMoD, Damien Cassou got his Ph.D. in
computer science from the University of Bordeaux I:
his thesis was about bringing general-purpose program1 http://bergel.eu
2 http://damiencassou.seasidehosting.st
412
Biographies
ming tools to dedicated domains through a domainspecific architecture description language and a programming framework
generator. He is one of the developers of Pharo and he collaborated on the
Pharo by Example book.
Stphane Ducasse3 is directeur de recherche at Inria. Since 2011, he is scientific deputee of the Inria Lille
Nord Europe research center where he leads the RMoD
(http://rmod.lille.inria.fr) team. He is expert in two domains:
object-oriented language design and reengineering. He
worked on traits, composable groups of methods, and
this work got some impact. Traits have been introduced
in AmbiantTalk, Racket, Squeak/Pharo, Perl, PHP and
under a variant into Scala, Fortress of SUN Microsystems. He is one of the developer of Pharo. He is also
expert on software quality, program understanding, program visualizations,
reengineering and metamodeling. He is one of the developer of Moose,
an open-source software analysis platform (http://www.moosetechnology.org).
Stthane works with Synectique (http://www.synectique.eu) a company building
dedicated tools for advanced software analysis.
Jannik Laval4 is an associate professor at MinesTelecom Institute, Mines Douai, France, since 2012. He
received the doctorate degree in computer science from
the University Lille 1, France, in June 2011. His thesis is
about software quality, visualizations, and reengineering. He uses Moose for all his software analysis (http:
//www.moosetechnology.org). At Mines Douai, he works on
software engineering for embedded systems, and more
particularly on modularity and tools for multi-robot systems. He is the main developer of Phratch, a visual programming language on top of Pharo (http://car.mines-douai.fr/category/phratch/).
He uses Phratch for teaching robotics software engineering to engineer students.
3 http://stephane.ducasse.free.fr
4 http://www.jannik-laval.eu