Polymorphsm Using Python
Polymorphsm Using Python
Abstract 1. Introduction
Following the increased popularity of dynamic languages and The increasing use of dynamic languages in critical applica-
their increased use in critical software, there have been many tion domains [21, 25, 30] has prompted academic research on
proposals to retrofit static type system to these languages to “retrofitting” dynamic languages with static typing. Examples
improve possibilities to catch bugs and improve performance. include using type inference or programmer declarations for
A key question for any type system is whether the types Self [1], Scheme [36], Python [4, 5, 29], Ruby [3, 15], JavaS-
should be structural, for more expressiveness, or nominal, to cript [17, 35], and PHP [14].
carry more meaning for the programmer. For retrofitted type Most mainstream programming languages use static typ-
systems, it seems the current trend is using structural types. ing from day zero, and thus naturally imposed constraints
This paper attempts to answer the question to what extent on the run-time flexibility of programs. For example, strong
this extra expressiveness is needed, and how the possible static typing usually guarantees that a well-typed x.m() at
polymorphism in dynamic code is used in practise. compile-time will not fail at run-time due to a “message not
We study polymorphism in 36 real-world open source understood”. This constraint restricts developers to updates
Python programs and approximate to what extent nominal that grow types monotonically.
and structural types could be used to type these programs. Retrofitting a static type system on a dynamic language
The study is based on collecting traces from multiple runs where the definitions of classes, and even individual objects,
of the programs and analysing the polymorphic degrees of may be arbitrarily redefined during runtime poses a signific-
targets at more than 7 million call-sites. ant challenge. In previous work for Ruby and Python, for
Our results show that while polymorphism is used in all example, restrictions have been imposed om the languages to
programs, the programs are to a great extent monomorphic. simplify the design of type systems. The simplifications con-
The polymorphism found is evenly distributed across librar- cern language features like dynamic code evaluation [15, 29],
ies and program-specific code and occur both during program the possibility to make dynamic changes to definitions of
start-up and normal execution. Most programs contain a few classes and methods [4], and possibility to remove methods
“megamorphic” call-sites where receiver types vary widely. [15]. Recent research [2, 24, 27] has shown that the use of
The non-monomorphic parts of the programs can to some such dynamic language features is rare—but non-negligible.
extent be typed with nominal or structural types, but none of Apart from the inherent plasticity of dynamic languages
the approaches can type entire programs. described above, a type system designer must also consider
the fact that dynamic typing gives a language unconstrained
polymorphism. In Python, and other dynamic languages,
Categories and Subject Descriptors D.3 Programming there is no static type information that can be used to control
Languages [D.3.3 Language Constructs and Features]: Poly- polymorphism e.g., for method calls or return values.
morphism Previous retrofitted type systems use different approaches
to handle ad-hoc polymorphic variables. Some state pre-
requisites disallowing polymorphic variables [5, 35], assum-
Keywords Python, dynamic languages, polymorphism, trace- ing that polymorphic variables are rare [29]. Others use a
based analysis flow-sensitive analysis to track how variables change types
[11, 15, 18]. Disallowing polymorphic variables is too re-
strictive as it rules out polymorphic method calls [19, 24, 27].
There are not many published results on the degree of
polymorphism or dynamism in dynamic languages [2, 8, 19,
24, 27]. This makes it difficult to determine whether or not
relying on the absence of, or restricting, some dynamic beha-
viour is possible in practise, and whether certain techniques
for handling difficulties arising due to dynamicity is prefer- 2.1 Polymorphism and Types
able over others. Most definitions of object-oriented programming lists poly-
This article presents the results of a study of the runtime morphism—the ability of an object of type T to appear as of
behaviour of 36 open source Python programs. We inspect another type T 0 —as one of its cornerstones.
traces of runs of these programs to determine the extent to In dynamically typed languages, like Python, polymorph-
which method calls are polymorphic in nature, and the nature ism is not constrained by static checking and error-checking
of that polymorphism, ultimately to find out if programs’ is deferred to the latest possible time for maximal flexibility.
polymorphic behaviour can be fitted into a static type. This means that T and T 0 from above need not be explicitly
1.1 Contributions related (through inheritance or other language mechanisms).
It also means that fields can hold values of any type and still
This paper presents the results of a trace-based study of a function normally (without errors) as long as all uses con-
corpus of 36 open-source Python programs, totalling oven 1 form to the run-time type of the current object they store.
million LOC. Extracting and analysing over 7 million call- This kind of typing/polymorphic behaviour is commonly re-
sites in over 800 million events from trace-logs, we report ferred to as “duck typing” [23].
several findings – in particular: Subtype polymorphism in statically typed languages is
– A study of the run-time types of receiver variables that bounded by the requirements needed for static checking (e.g.,
shows the extent to which the inherently polymorphic that all well-typed method calls can be bound to suitable
nature of dynamic typing is used in practise. methods at run-time). This leads to restrictions for how T and
T 0 may be related. In a nominal system this may mean that
We find that variables are predominantly monomorphic,
the classes used to define T and T 0 must have an inheritance
i.e., only holds values of a single type during a program.
relation. A nominal type is a type that is based on names, that
However, most programs have a few places which are
is that type equality for two objects requires that the name
megamorphic, i.e., variables containing values of many
of the types of the objects is the same. In a structural type
different types at different times or in different contexts.
system, type equivalence and subtyping is decided by the
Hence, a retrofitted type system should consider both these
definition of values’ structures. For example, in OCaml and
circumstances.
Strongtalk, type equivalence is determined by comparing the
– An approximation of the extent to which a program can be fields and methods of two objects and also comparing their
typed using nominal or structural types using three type- signatures (method arguments and return values).
ability metrics for nominal types, nominal types with para- Strachey [33] separates the polymorphism of functions
metric polymorphism, and structural types. We consider into two different categories: ad hoc and parametric. The
both individual call-sites and clusters of call-sites inside a main difference between the categories is that ad-hoc poly-
single source file. morphism lacks the structure brought by parameterisation
We find that, because of monomorphism, most programs and that there is no unified method that makes it possible to
can be typed to a large extent using simple type systems. predict the return type from an ad-hoc polymorphic function
Most polymorphic and megamorphic parts of programs are based on the arguments passed in as would be the case for
not typeable by nominal or structural systems, for example the parametric polymorphic function [33]. As an example of
due to use of value-based overloading. Structural typing is ad hoc polymorphism, consider overloading of / for combin-
only slightly better than nominal typing at handling non- ations of integers and reals always yielding a real.
monomorphic program parts. Cardelli and Wegner [10] further divide polymorphism
into two categories at the top level: universal and ad-hoc.
Our trace data and a version of this article with larger figures
Universal polymorphism corresponds to Strachey’s paramet-
is available from dsv.su.se/~beatrice/python.
ric polymorphism together with call inclusion polymorphism,
Outline The paper is organised as follows. § 2 gives a which includes object-oriented polymorphism (subtypes and
background on polymorphism and types. § 3 describe the inheritance). The common factor for universal polymorphism
motivations and goals of the work. § 4 accounts for how the is that it is based on a common structure (type) [10]. Ad-hoc
work was conducted. § 5 presents the results. § 7 discusses polymorphism, on the other hand, is divided into overload-
related research and finally in § 8 we present our conclusions ing and coercion, where overloading allows using the same
and present ideas for future work. name for several functions and coercion allowing polymorph-
ism in situations when a type can automatically be translated
2. Background to another type [10].
Using the terms from above, “duck typing” can be de-
We start with a background and overview of polymorphism
scribed as a lazy structural typing [23] (late type checking)
and types (§ 2.1) followed by a quick overview of the Python
and is a subcategory of ad-hoc polymorphism [10].
programming language (§ 2.2). A reader with a good under-
standing of these areas may skip over either or both part(s).
2.2 Python 2.3 Measuring Polymorphism
Python is a class based language, but Python’s classes are When the code below is run, a class Foo is first defined
far less static than classes normally found in statically typed containing two methods; init and bar, both expecting
systems. Class definitions are executed during runtime much one argument. The init method creates the instance
like any other code, which means that a class is not available variable a and assigns the expected argument to it. In the
until its definition has been executed. Class definition may bar method a call is made to the method foo on the instance
appear anywhere, e.g., in a subroutine or within one branch variable a and then a call is made to the method baz on the
of a conditional statement. If two class definitions with the argument variable b.
same name are executed within the same name-space, the last 01 10 f = Foo(...)
definition will replace the first (although already created ob- 02 class Foo: 11
jects will keep the old class definition). If a class is reloaded, 03 def __init__(self, a): 12 for e in range(0,100):
it might have been reloaded with a different set of methods 04 self.a = a 13 class Bar:
05 14 def baz(self):
than the original one. Given this possibility to reload classes, 06 def bar(self, b): 15 pass
the same code creating objects from the class C may end up 07 self.a.foo() 16
creating objects of different classes at different times during 08 b.baz() 17 f.bar(Bar())
execution, objects that may have a different set of methods. 09 18
Python allows multiple inheritance, i.e., a class may have After the class definition is finished, a variable f is created
many superclasses [28]. Subclasses may override methods in and it is assigned with a new object of the class Foo.
its superclass(es) and may call methods in its superclass(es). On line 12–17, follows a for loop that will iterate 100
Python’s built-in classes can be used as superclasses. times and for every iteration the class Bar is created with
Python classes are represented as objects at runtime. Class a method baz that has no body. On the last line in the for
objects can contain attributes and methods. All members in a loop a call is made to the method bar for the Foo object in f
Python class (attributes and methods) are public. Methods al- (from line 10) passing a new object of the current Bar class
ways take the receiver of the method call as the first argument. as an argument.
It must be explicitly included in the method’s parameter list Several lines in the code above (7, 8 and 17), contain
but is passed in implicitly in the method call. method calls. These lines are call-sites.
There are two different types of classes available in Py-
thon up to version 3.0: old-style/classic classes and new-style D E F I N I T I O N 1 (Call-site). A call-site in a program is a
classes. The latter were introduced in Python 2.2 (released point (on a line in a Python source file) where a method call
in 2001) to unify class and type hierarchies of the language is made.
and they also (among other things) brought a new method Every call-site has two points, the receiver and the argu-
resolution order for multiple inheritance. From Python 3.0, ment(s), where types may vary depending on the path taken
all classes are new-style classes. through the program up to the call-site. In the analyses made
Python objects are essentially hash tables in that attributes for this paper, the focus has been on the receiver types. Argu-
and methods and their names may be regarded as key-value ments will generally become receivers at a later point in the
pairs. Both attributes and methods may be added, replaced program execution, which means that also that polymorph-
and entirely removed also after initialisation. For an object ism will get captured by the logging.
foo, we can add an attribute bar by simply assigning to On line 17, a call is made to the method bar, where the
that name, i.e. foo.bar = ’Baz’. The same attribute may receiver will always be an object of the class Foo, since the
then be removed, e.g., by the statement del foo.bar, which assignment to f is made from a call to the constructor of
removes both key and value. Foo on line 10. This means that the call-site f.bar(...) on
Classes in Python are thus less templates for object cre- line 10 is monomorphic and will always resolve to the same
ation than what we may be used to from statically typed method at run-time.
languages, but more like factories creating objects–objects
that may later change independent of their class and the other D E F I N I T I O N 2 (Monomorphic). A call-site that has the
objects created from the same class. This more dynamic ap- same receiver type in all observations is monomorphic.
proach to classes has implications on and may increase pro-
The call-site on line 7 may be monomorphic, but that can-
gram polymorphism.
not be concluded from the static information in the available
In nominally typed language, a type Sub is a subtype of
code. The type of the receiver on line 7 depends on the type
another type Sup only if the class Sup is explicitly declared
of the argument to the constructor when the object was cre-
to be its supertype. In some languages, Python for example,
ated. If objects are created storing objects of different types in
this declaration may be updated and changed during runtime.
the instance variable a, the line 7 will potentially be executed
with more than one receiver type, that is, it is polymorphic.
If the number of receiver types is very high, the call-site is
instead megamorphic. Following Agesen [1] we count a call- 2. Extent and degree
site as megamorphic if it has been observed with six or more (a) What is the proportion between monomorphic and
receiver types. polymorphic call-sites?
D E F I N I T I O N 3 (Polymorphic). A call-site that has 2–5 (b) What is the average, median and maximum degrees
different receiver types in all observations is polymorphic. of polymorphism and megamorphism (that is, number
of receiver types) of non-monomorphic call-sites?
(c) To what extent are non-monomorphic call-sites
D E F I N I T I O N 4 (Megamorphic). A call-site that has six “megamorphic”?
or more receiver types in all observations is megamorphic. (d) Does the degree of polymorphism and
Line 8 in the code above shows an example of a mega- megamorphism differ between library and program or
morphic call-site with a call to the method baz for the object between start-up and normal runtime?
in the variable b. The value of b depends on what is passed (e) What types are seen at extremely megamorphic
as the argument with the method call to bar, made on line call-sites (e.g., with 350 different receiver types)?
17. The loop on line 12–17 runs the class definition of Bar
3. Typeability
in every iteration, which means that every call to the method
baz will be made to an object of a new class. Nevertheless, (a) How do types at polymorphic and megamorphic
since the class always has the same name and contains the call-sites and clusters relate to each other in terms of
same fields and methods, the classes created here should be inheritance and overridden methods?
regarded as the same class. This megamorphism is false and (b) To what extent is it possible to find a common super
will not be considered as such by our analysis. type for all the observed receiver types that makes it
possible to fit the polymorphism into a nominal static
3. Motivation and Research Questions type?
A plethora of proposals for static type systems for dynamic (c) To what extent is it possible to find a common super
languages exist [1, 3–5, 15, 17, 29, 35, 36]. The inherent type for all the observed receiver types if the nominal
plasticity of the dynamic languages (for example, the possib- types are extended with parametric polymorphism?
ility to add and remove fields and methods and change an (d) To what extent do receiver types in clusters contain
object’s class at run-time) is a major obstacle for designers all the methods that are called at the call-sites of the
of type systems but the use of these possibilities have been cluster? That is, to what extent can we find a common
shown to be infrequent [2, 19, 24, 27]. Additionally, a type structural type for all the receiver types found in
system designer must also take duck typing into consider- clusters?
ation, where objects of statically unrelated classes may be
used interchangeably in places where common subsets of Following [2, 19, 24, 27] we also examine the applicability
their methods are used. of the phenomenon of Folklore, put forward by Richards et
We examine several aspects of Python programs of in- al [27] which states that there is an initialisation phase that is
terest to designers of type systems for dynamic languages more dynamic than other phases of the runtime. We compare
in general and for Python specifically. These aspects of pro- if there are differences in the use of polymorphism depending
gram dynamicity may also be used to enable comparisons of on where we find the method calls; during start-up vs. during
different proposed type system solutions. normal execution and also if there are differences between
We study Python’s unlimited polymorphism—duck typing— libraries and program-specific code.
in particular the degree of polymorphism in receivers of
method calls in typical programs: How many different types 4. Methodology
are used and how related the receivers’ types are e.g., in Studying how polymorphism is used in Python programs
terms of inheritance. We study how the underlying dynamic necessitates studying real programs. We discarded static ap-
nature of Python affects the polymorphism of programs due proaches such as program analysis and abstract interpretation
to classes being dynamically created and possibly modified because of their over-approximate nature. Instead, we base
at run-time. our study on traces of running programs obtained by an in-
Analysis Questions Our questions belong to three categor- strumented version of the standard CPython interpreter that
ies: program structure, extent and degree and typeability: saves data about all method calls made throughout a program
run. Our instrumented interpreter is based on CPython 2.6.6
1. Program structure because of Debian packaging constraints, which was import-
(a) How many classes do Python programs use/create at ant to study certain proprietary code which in the end did not
run-time? How often are classes redefined? end up in this study.
(b) How many methods do Python classes have and how The results are obtained from in total 522 runs of 36 open
many methods are overridden in subclasses? source Python programs (see Table 1) collected from Source-
Forge [32]. Selection was based on programs’ popularity b a
(>1,000 downloads), that the program was still maintained B B
(updated during the last 12 months) and was classified as f: A f: C
stable, i.e., had been under development for some time. For
pragmatic reasons, we excluded programs that used C exten- A ≮∶ C
sions, and programs that for various reasons would not run C≮∶ A
under Debian. For equally pragmatic reasons, we excluded
Figure 1. Parametric polymorphism. Different instances of
plugins (e.g., to web browsers), programs that required spe-
B hold objects of different types in the f fields.
cific hardware (e.g., microscopes, network equipment or serv-
ers) and software that required subscriptions (e.g., poker site
accounts). based on how many receiver types were found according to
To separate events in the start-up phase from ”normal the following categories:
Wednesday 28 January 15