Python PreRead 1
Python PreRead 1
Python
Python is a high-level, dynamically typed multiparadigm programming language,
created by Guido van Rossum in the early 90s. It is now one of the most popular
languages in existence.
Python’s syntactic clarity allows you to express very powerful ideas in very few lines
of code while being very readable. It’s basically executable pseudocode!
def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)
Python Versions
If you haven’t experimented with Python at all and are just starting off, we
recommend you begin with the latest version of Python 3.
You can double-check your Python version at the command line after activating
your environment by running python --version .
Python 1
Google’s Python Style Guide is a fantastic resource with a list of dos and don’ts for
formatting Python code that is commonly followed in the industry.
Python Setup: Remote vs. Local offers an in-depth coverage of the various remote
and local options available.
which outputs:
Python 2
For more on the coding principles above, refer to The Zen of Python, Explained.
Indentation
Python is big on indentation! Where in other programming languages the
indentation in code is to improve readability, Python uses indentation to indicate a
block of code.
Code Comments
Python supports two types of comments: single-line and multi-line, as detailed
below:
Variables
In Python, there are no declarations unlike C/C++; only assignments:
Variables are “names” in Python that simply refer to objects. This implies that you
can make another variable point to an object by assigning it the original variable that
was pointing to the same object.
Python 3
a # Returns "[1, 2, 3, 4]"
b # Also returns "[1, 2, 3, 4]"
is checks if two variables refer to the same object, while == checks if the objects
that the variables point to have the same values:
In Python, everything is an object. This means that even None (which is used to
denote that the variable doesn’t point to an object yet) is also an object!
Python has local and global variables. Here’s an example of local vs. global variable
scope:
x = 5
def set_x(num):
# Local var x not the same as global variable x
x = num # Returns 43
x # Returns 43
def set_global_x(num):
global x
x # Returns 5
x = num # global var x is now set to 6
x # Returns 6
set_x(43)
set_global_x(6)
Python 4
Print Function
Python has a print function:
print("I'm Python. Nice to meet you!") # Returns I'm Python. Nice to meet you!
By default, the print function also prints out a newline at the end. Override the
optional argument end to modify this behavior:
Input Function
Python offers a simple way to get input data from console:
Order of Operations
Just like mathematical operations in other languages, Python uses the BODMAS
rule (also called the PEMDAS rule) to ascertain operator precedence. BODMAS is
an acronym and it stands for Bracket, Of, Division, Multiplication, Addition, and
Subtraction.
Numbers
Integers work as you would expect from other languages:
Python 5
x = 3
x # Prints "3"
type(x) # Prints "<class 'int'>"
x + 1 # Addition; returns "4"
x - 1 # Subtraction; returns "2"
x * 2 # Multiplication; returns "6"
x ** 2 # Exponentiation; returns "9"
x += 1 # Returns "4"
x *= 2 # Returns "8"
x % 4 # Modulo operation; returns "3"
y = 2.5
type(y) # Returns "<class 'float'>"
y, y + 1, y * 2, y ** 2 # Returns "(2.5, 3.5, 5.0, 6.25)"
Some nuances in integer/float division that you should take note of:
# Integer division rounds down for both positive and negative numbers
-5 // 3 # -2
5.0 // 3.0 # 1.0
-5.0 // 3.0 # -2.0
Note that unlike many languages, Python does not have unary increment ( x++ ) or
decrement ( x-- ) operators, but accepts the += and = operators.
Python also has built-in types for complex numbers; you can find all of the details in
the Python documentation.
Python 6
In the example below, underscores are used to group decimal numbers by
thousands.
large_num = 1_000_000
large_num # Returns 1000000
Booleans
Python implements all of the usual operators for Boolean logic, but uses English
words rather than symbols ( && , || , etc.):
t = True
f = False
type(t) # Returns "<class 'bool'>"
t and f # Logical AND; returns "False"
t or f # Logical OR; returns "True"
not t # Logical NOT; returns "False"
t != f # Logical XOR; returns "True"
None , 0 , and empty strings/lists/dicts/tuples all evaluate to False . All other values
are True .
Python 7
bool({}) # Returns False
bool(()) # Returns False
# Equality is ==
1 == 1 # Returns True
2 == 1 # Returns False
# Inequality is !=
1 != 1 # Returns False
2 != 1 # Returns True
# More comparisons
1 < 10 # Returns True
1 > 10 # Returns False
2 <= 2 # Returns True
2 >= 2 # Returns True
Casting integers as booleans transforms a non-zero integer to True , while zeros get
transformed to False :
Using logical operators with integers casts them to booleans for evaluation, using
the same rules as mentioned above. However, note that the original pre-cast value
is returned.
Strings
Python has great support for strings:
Python 8
hello = 'hello' # String literals can use single quotes
world = "world" # or double quotes; it does not matter.
# But note that you can nest one in another, for e.g.,
'a"x"b' and "a'x'b"
print(hello) # Prints "hello"
len(hello) # String length; returns "5"
hello[0] # A string can be treated like a list of characters, retur
ns 'h'
hello + ' ' + world # String concatenation using '+', returns "hello world"
"hello " "world" # String literals (but not variables) can be concatenated
without using '+', returns "hello world"
'%s %s %d' % (hello, world, 12) # sprintf style string formatting, returns "hello world 1
2"
s = "hello"
s.capitalize() # Capitalize a string; returns "Hello"
s.upper() # Convert a string to uppercase; prints "HELLO"
s.rjust(7) # Right-justify a string, padding with spaces; returns " hello"
s.center(7) # Center a string, padding with spaces; returns " hello "
s.replace('l', '(ell)') # Replace all instances of one substring with another;
# returns "he(ell)(ell)o"
' world '.strip() # Strip leading and trailing whitespace; returns "world"
You can find a list of all string methods in the Python documentation.
String Formatting
Python has several different ways of formatting strings. Simple positional formatting
is probably the most common use-case. Use it if the order of your arguments is not
likely to change and you only have very few elements you want to concatenate.
Since the elements are not represented by something as descriptive as a name this
simple style should only be used to format a relatively small number of elements.
Python 9
New Style/Python 2.6
The new style uses ''.format() as follows:
Note that both the old and new style of formatting are still compatible with the
newest releases of Python, which is version 3.8 at the time of writing.
With the new style formatting, you can give placeholders an explicit positional index
(called positional arguments). This allows for re-arranging the order of display
without changing the arguments. This operation is not available with old-style
formatting.
For the example print('{0} {1} cost ${2}'.format(6, 'bananas', 1.74)) , the output is 6
You can also use keyword arguments instead of positional parameters to produce
the same result. This is called keyword arguments.
Python 10
For the example print('{quantity} {item} cost ${price}'.format(quantity=6,
F-strings
Starting Python 3.6, you can also format strings using f-string literals, which are
much more powerful than the old/new string formatters we discussed earlier:
name = "Reiko"
f"She said her name is {name}." # Returns "She said her name is Reiko."
# You can basically put any Python statement inside the braces and it will be output in th
e string.
f"{name} is {len(name)} characters long." # Returns "Reiko is 5 characters long."
Unfortunately the default alignment differs between old and new style formatting.
The old style defaults to right aligned while the new style is left aligned.
Python 11
To align text left:
Again, the new style formatting surpasses the old variant by providing more control
over how values are padded and aligned. You are able to choose the padding
character and override the default space character for padding. This operation is not
available with old-style formatting.
And also center align values. This operation is not available with old-style
formatting.
When using center alignment where the length of the string leads to an uneven split
of the padding characters the extra character will be placed on the right side. This
operation is not available with old-style formatting.
You can also combine the field numbering (say, {0} for the first argument)
specification with the format type (say, {:s} for strings):
Unpacking arguments:
Python 12
f-strings can also be formatted similarly:
As an example:
test2 = "test2"
test1 = "test1"
test0 = "test0"
s1 = 'a'
s2 = 'ab'
s3 = 'abc'
s4 = 'abcd'
print(f'{s1:>10}') # Prints a
print(f'{s2:>10}') # Prints ab
print(f'{s3:>10}') # Prints abc
print(f'{s4:>10}') # Prints abcd
Numbers
Python 13
Of course it is also possible to format numbers.
Integers:
Floats:
Padding Numbers
Similar to strings numbers can also be constrained to a specific width.
Again similar to truncating strings the precision for floating point numbers limits the
number of positions after the decimal point. For floating points, the padding value
represents the length of the complete output (including the decimal). In the example
below we want our output to have at least 6 characters with 2 after the decimal
point.
For integer values providing a precision doesn’t make much sense and is actually
forbidden in the new style (it will result in a ValueError ).
Some examples:
Show a space for positive numbers, but a sign for negative numbers:
Python 14
'{:-f}; {:-f}'.format(3.14, -3.14) # Returns "3.140000; -3.140000"
Converting the value to different bases using replacing {:d} , {:x} and {:o} :
"int: {0:d}; hex: {0:x}; oct: {0:o}; bin: {0:b}".format(42) # Returns "i
nt: 42; hex: 2a; oct: 52; bin: 101010'"
With 0x , 0o , or 0b as prefix:
Expressing a percentage:
import datetime
d = datetime.datetime(2010, 7, 4, 12, 15, 58)
'{:%Y-%m-%d %H:%M:%S}'.format(d) # Returns "2010-07-04 12:15:58"
width = 5
for num in range(5,12):
for base in 'dXob':
print('{0:{width}{base}}'.format(num, base=base, width=width), end=' ')
print()
# Prints:
# 5 5 5 101
# 6 6 6 110
Python 15
# 7 7 7 111
# 8 8 10 1000
# 9 9 11 1001
# 10 A 12 1010
# 11 B 13 1011
Format width:
sentence.find("day") # Returns 2
sentence.find("nice") # Returns -1
You can also provide the starting and stopping position of the search:
Note that you can also use str.index() to accomplish the same end result.
Python 16
Replace One String with Another String Using Regular
Expressions
If you want to either replace one string with another string or to change the order of
characters in a string, use re.sub() .
re.sub()allows you to use a regular expression to specify the pattern of the string
you want to swap.
In the code below, we replace 3/7/2021 with Sunday and replace 3/7/2021 with
2021/3/7.
import re
Containers
Containers are any object that holds an arbitrary number of other objects. Generally,
containers provide a way to access the contained objects and to iterate over them.
Python includes several built-in container types: lists, dictionaries, sets, and tuples:
from collections import Container # Can also use "from typing import Sequence"
isinstance(list(), Container) # Prints True
isinstance(tuple(), Container) # Prints True
isinstance(set(), Container) # Prints True
isinstance(dict(), Container) # Prints True
# Note that the "dict" datatype is also a mapping datatype (along with being a container):
isinstance(dict(), collections.Mapping) # Prints True
Lists
A list is the Python equivalent of an array, but is resizable and can contain elements
of different types:
Python 17
l = [3, 1, 2] # Create a list
l[0] # Access a list like you would any array; returns "1"
l[4] # Looking out-of-bounds is an IndexError; Raises an "IndexError"
l[::-1] # Return list in reverse order "[2, 1, 3]"
l, l[2] # Returns "([3, 1, 2] 2)"
l[-1] # Negative indices count from the end of the list; prints "2"
As usual, you can find all the gory details about lists in the Python documentation.
If you want access to the index of each element within the body of a loop, use the
built-in enumerate function:
Python 18
List Comprehensions
List comprehensions are a tool for transforming one list (any iterable actually) into
another list. During this transformation, elements can be conditionally included in
the new list and each element can be transformed as needed.
If you can rewrite your code to look just like this for loop, you can also rewrite it as
a list comprehension:
new_things = []
for item in old_things:
if condition_based_on(item):
new_things.append("something with " + item)
You can rewrite the above for loop as a list comprehension like this:
Copying the expression that we’ve been append ing into this new list.
As a simple example, consider the following code that computes square numbers:
nums = [0, 1, 2, 3, 4]
squares = []
Python 19
for x in nums:
if x % 2 == 0
squares.append(x ** 2)
squares # Returns [0, 4, 16]
nums = [0, 1, 2, 3, 4]
[x ** 2 for x in nums if x % 2 == 0] # Returns [0, 4, 16]
nums = [0, 1, 2, 3, 4]
[x ** 2 for x in nums] # Returns [0, 1, 4, 9, 16]
You can also use if / else in a list comprehension. Note that this actually uses a
different language construct, a conditional expression, which itself is not part of the
comprehension syntax, while the if after the for … in is part of the list
comprehension syntax.
nums = [0, 1, 2, 3, 4]
Python 20
Nested Loops
In this section, we’ll tackle list comprehensions with nested looping.
flattened = []
for row in matrix:
for n in row:
flattened.append(n)
Nested loops in list comprehensions do not read like English prose. A common
pitfalls is to read this list comprehension as:
But that’s not right! We’ve mistakenly flipped the for loops here. The correct
version is the one above.
When working with nested loops in list comprehensions remember that the for
Slicing
In addition to accessing list elements one at a time, Python provides concise syntax
to access sublists; this is known as slicing:
Python 21
Assigning to a slice (even with a source of different length) is possible since lists are
mutable:
List Functions
l = [1, 2, 3]
del l_copy[2] # Remove arbitrary elements from a list with "del"; l_copy is now [1, 2]
l.index(3) # Get the index of the first item found matching the argument; returns 3
# l.index(4) # Raises a ValueError as 4 is not in the list
l.append(l_copy) # You can append lists using the "append()" method; returns [1, 2, 3, [1,
2, 3]]
Python 22
1 in l # Check for existence (also called "membership check") in a list with "i
n"; returns True
List concatenation using .extend() can be achieved using the in-place addition
operator, += .
Instead of needing to create an explicit list using the source argument (on the right),
as a hack, you can simply use a trailing comma to create a tuple out of the source
argument (and thus imitate the above functionality):
Dictionaries
A dictionary stores (key, value) pairs, similar to a Map in Java or an object in
Javascript. In other words, dictionaries store mappings from keys to values.
Python 23
"1"
'two' in d # Check if a dictionary has a given key; returns "Tru
e"
d['four'] = 4 # Set an entry in a dictionary
d['four'] # Returns "4"
You can find all you need to know about dictionaries in the Python documentation.
In the above snippet, four does not exist in d . We get a KeyError when we try to
access d[four] . As a result, in many situations, we need to check if the key exists in
a dictionary before we try to access it.
<dict>.get()
The get() method supports a default argument which is returned when the key
being queried is missing:
A good use-case for get() is getting values in a nested dictionary with missing keys
where it can be challenging to use a conditional statement:
fruits = [
{"name": "apple", "attr": {"color": "red", "taste": "sweet"}},
{"name": "orange", "attr": {"taste": "sour"}},
{"name": "grape", "attr": {"color": "purple"}},
{"name": "banana"},
]
Python 24
colors = [fruit["attr"]["color"]
if "attr" in fruit and "color" in fruit["attr"] else "unknown"
for fruit in fruits]
colors # Returns ['red', 'unknown', 'purple', 'unknown']
In contrast, a better way is to use the get() method twice like below. The first get
method will return an empty dictionary if the key attr doesn’t exist. The second
get() method will return unknown if the key color doesn’t exist.
defaultdict
We can also pass in a lambda as the factory function to return custom default
values. Let’s say for our default value we return the tuple (0, 0) .
Python 25
Using a defaultdict can help reduce the clutter in your code, speeding up your
implementation.
del
Key Datatypes
Note that as we saw in the section on tuples, keys for dictionaries have to be
immutable datatypes, such as ints, floats, strings, tuples, etc. This is to ensure that
the key can be converted to a constant hash value for quick look-ups.
Python 26
invalid_dict = {[1, 2, 3]: "123"} # Raises a "TypeError: unhashable type: 'list'"
valid_dict = {(1, 2, 3): [1, 2, 3]} # Values can be of any type, however.
Get all keys as an iterable with keys() . Note that we need to wrap the call in list()
to turn it into a list, as seen in the putting it all together section on iterators. Note
that for Python versions <3.7, dictionary key ordering is not guaranteed, which is
why your results might not match the example below exactly. However, as of Python
3.7, dictionary items maintain the order with which they are inserted into the
dictionary.
Get all values as an iterable with values() . Once again we need to wrap it in list()
to convert the iterable into a list by generating the entire list at once. Note that the
discussion above regarding key ordering holds below as well.
If you want access to keys and their corresponding values, use the items method:
Python 27
Dictionary Comprehensions
These are similar to list comprehensions, but allow you to easily construct
dictionaries.
As an example, consider a for loop that makes a new dictionary by swapping the
keys and values of the original one:
flipped = {}
for key, value in original.items():
flipped[value] = key
As another example:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print(even_num_to_square) # Prints "{0: 0, 2: 4, 4: 16}"
Sets
A set is an unordered collection of distinct elements. In other words, sets do not
allows duplicates and thus lend themselves for uses-cases involving retaining
unique elements (and removing duplicates) canonically. As a simple example,
consider the following:
Python 28
len(animals) # Number of elements in a set; returns "3"
animals.add('cat') # Adding an element that is already in the set does nothing
len(animals) # Returns "3"
animals.remove('cat') # Remove an element from a set
len(animals) # Returns "2"
animals # Returns "{'fish', 'dog'}"
animals = set()
animals.add('fish', 'dog') # Returns "{'fish', 'dog'}"
s = {1, 2, 3}
s1 = s.copy() # s is {1, 2, 3}
s1 is s # Returns False
Set Operations
Python 29
# Check if set on the left is a subset of set on the right
{1, 2} <= {1, 2, 3} # Returns True
As usual, everything you want to know about sets can be found in the Python
documentation.
Set Comprehensions
Like lists and dictionaries, we can easily construct sets using set comprehensions.
As an example, consider a for loop that creates a set of all the first letters in a
sequence of words:
first_letters = set()
for w in words:
first_letters.add(w[0])
As another example:
Python 30
Yet another example:
Tuples
A tuple is an immutable ordered list of values.
t = (1, 2, 3)
t[0] # Returns "1"
t[0] = 3 # Raises a "TypeError: 'tuple' object does not support item assignment"
Note that syntactically, a tuple of length one has to have a comma after the last
element but tuples of other lengths, even zero, do not:
l = [1, 2]
d[l] # Raises a "TypeError: unhashable type: 'list'"
d[[1, 2]] # Raises a "TypeError: unhashable type: 'list'"
len(tup) # Returns 3
tup + (4, 5, 6) # Returns (1, 2, 3, 4, 5, 6)
Python 31
tup[:2] # Returns (1, 2)
2 in tup # Returns True
You can find all you need to know about tuples in the Python documentation.
Functions
Python functions are defined using the def keyword:
def sign(x):
if x > 0:
return 'positive'
elif x < 0:
return 'negative'
else:
return 'zero'
We will often define functions to take (optional) keyword arguments, like this:
Python 32
Keyword arguments can arrive in any order:
You can define functions that take a variable number of positional arguments. In a
function definition, packs all arguments in a tuple (this process is called tuple-
packing).
def varargs(*args):
return args
You can define functions that take a variable number of keyword arguments, as
well. In a function definition, * packs all arguments in a dictionary (this process is
called dictionary-packing).
def keyword_args(**kwargs):
return kwargs
Python 33
unpacks all arguments in a dictionary (this process is called dictionary-unpacking).
args = (1, 2, 3, 4)
kwargs = {"a": 3, "b": 4}
all_the_args(*args) # equivalent to all_the_args(1, 2, 3, 4)
all_the_args(**kwargs) # equivalent to all_the_args(a=3, b=4)
all_the_args(*args, **kwargs) # equivalent to all_the_args(1, 2, 3, 4, a=3, b=4)
With Python, you can return multiple values from functions as intuitively as returning
a single value:
x = 1
y = 2
x, y = swap(x, y) # Returns x = 2, y = 1
# (x, y) = swap(x,y) # Again parenthesis have been excluded but can be included.
Python’s functions are first-class objects. This implies that you can assign them to
variables, store them in data structures, pass them as arguments to other functions,
and even return them as values from other functions:
def create_adder(x):
def adder(y):
return x + y
return adder
add_10 = create_adder(10)
add_10(3) # Returns 13
Note that while the following short-circuit AND code-pattern is seen in function
return statements, it is rarely used but it is still valuable to know its usage:
# Short-circuit AND: The expression x and y first evaluates x; if x is false, its value is
returned; otherwise, y is evaluated and the resulting value is returned.
# Per https://docs.python.org/3.6/reference/expressions.html#boolean-operations
def short_circuit_and(a, b):
return a and b
Python 34
print(short_circuit_and(a=None, b=1)) # prints None
print(short_circuit_and(a=1, b=None)) # prints None
print(short_circuit_and(a=1, b=2)) # prints 2
if a:
return b
else:
return None
Nested Functions
A function defined inside another function is called a nested function. Nested
functions can access variables of the enclosing scope.
In Python, these “non-local” variables are read-only by default and we must declare
them explicitly as non-local (using nonlocal keyword) in order to modify them.
def print_msg(msg):
# This is the outer enclosing function
def printer():
# This is the nested function
print(msg)
printer()
which outputs:
Python 35
local to the function. The nonlocal and global declarations cause it to refer to the
variable that exists outside of the function. In other words, nonlocal lets you assign
values to a variable in an outer (but non-global) scope similar to how global lets
you assign values to a variable in a global scope. See PEP 3104 for more details on
nonlocal .
Note that if a function does not assign to a variable, then the declarations are not
needed, and it automatically looks for it in a higher scope.
As an example, consider the following code snippet that does not use nonlocal :
x = 0
def outer():
x = 1
def inner():
x = 2
print("inner:", x)
inner()
print("outer:", x)
outer()
print("global:", x)
# inner: 2
# outer: 1
# global: 0
The following code snippet is a variation of the above which uses nonlocal , where
inner() ’s x is now also outer() ’s x :
x = 0
def outer():
x = 1
def inner():
nonlocal x
x = 2
print("inner:", x)
inner()
print("outer:", x)
outer()
print("global:", x)
# inner: 2
Python 36
# outer: 2
# global: 0
x = 0
def outer():
x = 1
def inner():
global x
x = 2
print("inner:", x)
inner()
print("outer:", x)
outer()
print("global:", x)
# inner: 2
# outer: 1
# global: 2
Closure
Type Hinting
Introduced in Python 3.5, the typing module offers type hint functionality, which
documents what type the contents of the containers needed to be.
In the function greeting below, the argument name is expected to be of type str
(annotated as name: str ) and the return type str . Subtypes are accepted as
arguments.
Type Aliases
Python 37
A type alias is defined by assigning the type to the alias. In this example, Vector
Vector = list[float]
Type aliases are useful for simplifying complex type signatures. For example:
# The static type checker will treat the previous type signature as
# being exactly equivalent to this one.
def broadcast_message(
message: str,
servers: Sequence[tuple[tuple[str, int], dict[str, str]]]) -> None:
...
Note that None as a type hint is a special case and is replaced by type(None) .
Any
The Any type is special in that it indicates an unconstrained datatype. A static type
checker will treat every type as being compatible with Any and Any as being
compatible with every type.
This means that it is possible to perform any operation or method call on a value of
type Any and assign it to any variable:
a: Any = None
Python 38
a = [] # OK
a = 2 # OK
s: str = ''
s = a # OK
Furthermore, all functions without a return type or parameter types will implicitly
default to using Any :
def legacy_parser(text):
...
return data
This behavior allows Any to be used as an escape hatch when you need to mix
dynamically and statically typed code.
Tuple
Tuple type; Tuple[X, Y] is the type of a tuple of two items with the first item of type
X and the second of type Y . The type of the empty tuple can be written as
Tuple[()] .
Python 39
To specify a variable-length tuple of homogeneous type, use literal ellipsis, e.g.
Tuple[int, ...] . A plain Tuple is equivalent to Tuple[Any, ...] , and in turn to tuple
In the example below, we’re expecting the type of the points variable to be a tuple
that contains two floats within.
List
Practically similar to tuple type, just that list type is, as the name suggests, for lists.
In the example below, we’re expecting the function to return a list of dicts that map
strings as keys to strings as values.
Union
Used to signify support for two or more dataypes; Union[X, Y] is equivalent to X | Y
To define a union, use e.g. Union[int, str] or the shorthand int | str . Using the
shorthand version is recommended. Details:
Python 40
Union[int] == int # The constructor actually returns int
Optional
Optional[X] is equivalent to X | None (or Union[X, None] ). In other words,
Optional[...] is a shorthand notation for Union[..., None] , telling the type checker
that either an object of the specific type is required, or None is required. Note that
... stands for any valid type hint, including complex compound types or a Union[]
of more types.
Note that this is not the same concept as an optional argument, which is one that
has a default. An optional argument with a default does not require the Optional
qualifier on its type annotation just because it is optional. For example:
On the other hand, if an explicit value of None is allowed, the use of Optional is
appropriate, whether the argument is optional or not. For example:
Thus, whenever you have a keyword argument with default value None , you should
use Optional . (Note: If you are targeting Python 3.10 or newer, PEP 604 introduced
a better syntax, see below).
Python 41
As an two example, if you have dict and list container types, but the default
value for the a keyword argument shows that None is permitted too, use
Optional[...] :
As a recommendation, stick to using Optional[] when setting the type for a keyword
argument that uses = None to set a default value, this documents the reason why
None is allowed better. Moreover, it makes it easier to move the Union[...] part into
For example:
then documentation is improved by pulling out the Union[str, int] into a type alias:
Python 42
def api_function(optional_argument: Optional[IdTypes] = None) -> None:
"""API Function that does blah.
The refactor to move the Union[] into an alias was made all the much easier
because Optional[...] was used instead of Union[str, int, None] . The None value is
not a ID type after all, it’s not part of the value, None is meant to flag the absence of
a value.
When only reading from a container type, you may just as well accept any
immutable abstract container type; lists and tuples are Sequence objects, while dict
is a Mapping type:
In Python 3.9 and up, the standard container types have all been updated to
support using them in type hints, see PEP 585. But, while you now can use
dict[str, int] or list[Union[int, str]] , you still may want to use the more
expressive Mapping and Sequence annotations to indicate that a function won’t be
mutating the contents (they are treated as ‘read only’), and that the functions would
work with any object that works as a mapping or sequence, respectively.
Python 3.10 introduces the | union operator into type hinting, see PEP 604.
Instead of Union[str, int] you can write str | int . In line with other type-hinted
languages, the preferred (and more concise) way to denote an optional argument in
Python 3.10 and up, is now Type | None , e.g. str | None or list | None .
Python 43
Control Flow
if Statement
# Here is an if statement.
# This prints "some_var is smaller than 10"
if some_var > 10:
print("some_var is totally bigger than 10.")
elif some_var < 10: # This elif clause is optional.
print("some_var is smaller than 10.")
else: # This is optional too.
print("some_var is indeed 10.")
Conditional Expression
if can also be used as an expression to form a conditional expression, as an
equivalent of C’s ?: ternary operator:
Conditional expressions can be used in all kinds of situations where you want to
choose between two expression values based on some condition:
value = 123
print(value, 'is', 'even' if value % 2 == 0 else 'odd')
for Loop
for loops iterate over iterables such as lists, tuples, dictionaries, and sets.
which outputs:
dog is a mammal
cat is a mammal
mouse is a mammal
Python 44
range() and for loops are a powerful combination. range(number) returns an
iterable of numbers from zero to the given number. More on range() in its dedicated
section.
for i in range(4):
print(i)
which outputs:
range(lower, upper) returns an iterable of numbers from the lower number to the
upper number.
which outputs:
range(lower, upper, step) returns an iterable of numbers from the lower number to
the upper number, while incrementing by step. If step is not indicated, the default
value is 1 .
which outputs:
To loop over a list, and retrieve both the index and the value of each item in the list,
use enumerate(iterable) :
which outputs:
For an in-depth treatment on how for loops work in Python, refer to our section on
The Iterator Protocol.
else Clause
for loops also have an else clause which most of us are unfamiliar with. The else
clause executes after the loop completes normally. This means that the loop did not
Python 45
encounter a break statement. They are really useful once you understand where to
use them.
The common construct is to run a loop and search for an item. If the item is found,
we break out of the loop using the break statement. There are two scenarios in
which the loop may end. The first one is when the item is found and break is
encountered. The second scenario is that the loop ends without encountering a
break statement. Now we may want to know which one of these is the reason for a
loop’s completion. One method is to set a flag and then check it once the loop ends.
Another is to use the else clause.
found_obj = None
for obj in objects:
if obj.key == search_key:
# Found it!
found_obj = obj
break
else:
# Didn't find anything
print('No object found.')
Using for else or while else blocks in production code is not recommended owing
to their obscurity. Thus, anytime you see this construct, a better alternative is to
either encapsulate the search in a function:
def find_obj(search_key):
for obj in objects:
if obj.key == search_key:
return obj
Python 46
Note that while the list comprehension version is not semantically equivalent to the
other two versions, but it works good enough for non-performance critical code
where it doesn’t matter whether you iterate the whole list or not.
Consider a simple example, which finds factors for numbers between 2 to 10:
By adding an additional else block which catches the numbers which have no
factors and are therefore prime numbers:
while Loop
While loops go on until a condition is no longer met:
x = 0
while x < 4:
print(x)
x += 1 # Shorthand for x = x + 1
which outputs:
Lambda Functions
Lambda expressions are a special syntax in Python for creating anonymous
functions. The lambda syntax itself is generally referred to as a lambda expression,
while the function you get back from this is called a lambda function.
Python 47
Python’s lambda expressions allow a function to be created and passed around
(often into another function) all in one line of code.
def normalize_case(string):
return string.casefold()
Lambda expressions are just a special syntax for making functions. They can only
have one statement in them and they return the result of that statement
automatically.
The inherent limitations of lambda expressions are actually part of their appeal.
When an experienced Python programmer sees a lambda expression they know
that they’re working with a function that is only used in one place and does just
one thing.
Python’s built-in sorted function accepts a function as its key argument. This key
function is used to compute a comparison key when determining the sorting order of
Python 48
items.
So sorted is a great example of a place that lambda expressions are often used:
The above code returns the given colors sorted in a case-insensitive way.
The sorted function isn’t the only use of lambda expressions, but it’s a common one.
The fact that lambda expressions can be passed around is their biggest benefit.
Returning automatically is neat but generally not a big benefit. I find the “single line
of code” limitation is neither good nor bad overall. The fact that lambda functions
can’t have docstrings and don’t have a name is unfortunate and their unfamiliar
syntax can be troublesome for newer Pythonistas.
Let’s take a look at the various ways lambda expressions are misused and
overused.
Python 49
Misuse: Naming Lambda Expressions
PEP8, the official Python style guide, advises never to write code like this:
If you want to create a one-liner function and store it in a variable, you should use
def instead:
PEP8 recommends this because named functions are a common and easily
understood thing. This also has the benefit of giving our function a proper name,
which could make debugging easier. Unlike functions defined with def , lambda
functions never have a name (it’s always <lambda> ):
If you want to create a function and store it in a variable, define your function
using def . That’s exactly what it’s for. It doesn’t matter if your function is a single
line of code or if you’re defining a function inside of another function, def works just
fine for those use cases.
Python 50
sorted_numbers = sorted(numbers, key=lambda n: abs(n))
The person who wrote this code likely learned that lambda expressions are used for
making a function that can be passed around. But they missed out on a slightly
bigger picture idea: all functions in Python (not just lambda functions) can be
passed around.
Since abs (which returns the absolute value of a number) is a function and all
functions can be passed around, we could actually have written the above code like
this:
Now this example might feel contrived, but it’s not terribly uncommon to overuse
lambda expressions in this way. Here’s another example I’ve seen:
Because we’re accepting exactly the same arguments as we’re passing into min ,
we don’t need that extra function call. We can just pass the min function to key
instead:
You don’t need a lambda function if you already have another function that does
what you want.
Python 51
colors = ["Goldenrod", "Purple", "Salmon", "Turquoise", "Cyan"])
colors_by_length = sorted(colors, key=lambda c: (len(c), c.casefold()))
That key function here is helping us sort these colors by their length followed by
their case-normalized name.
The code below carries out the same functionality as the above code, but is much
more readable:
def length_and_alphabetical(string):
"""Return sort key: length first, then case-normalized string."""
return (len(string), string.casefold())
This code is quite a bit more verbose, but I find the name of that key function makes
it clearer what we’re sorting by. We’re not just sorting by the length and we’re not
just sorting by the color: we’re sorting by both.
Naming functions often makes code more readable, the same way using tuple
unpacking to name variables instead of using arbitrary index-lookups often makes
code more readable.
Python 52
We’re hard-coding an index lookup here to sort points by their color. If we used a
named function we could have used tuple unpacking to make this code more
readable:
def color_of_point(point):
"""Return the color of the given point."""
(x, y), color = point
return color
Tuple unpacking can improve readability over using hard-coded index lookups.
Using lambda expressions often means sacrificing some Python language
features, specifically those that require multiple lines of code (like an extra
assignment statement).
Python’s map and filter functions are used for looping over an iterable and making
a new iterable that either slightly changes each element or filters the iterable down
to only elements that match a certain condition. We can accomplish both of those
tasks just as well with list comprehensions or generator expressions:
Personally, I’d prefer to see the above generator expressions written over multiple
lines of code (see my article on comprehensions) but I find even these one-line
Python 53
generator expressions more readable than those map and filter calls.
The general operations of mapping and filtering are useful, but we really don’t need
the map and filter functions themselves. Generator expressions are a special
syntax that exists just for the tasks of mapping and filtering. So my advice is to use
generator expressions instead of the map and filter functions.
Newer Pythonistas who are keen on functional programming sometimes write code
like this:
This code adds all the numbers in the numbers list. There’s an even better way to
do this:
Python’s built-in sum function was made just for this task.
The sum function, along with a number of other specialized Python tools, are easy
to overlook. But I’d encourage you to seek out the more specialized tools when you
need them because they often make for more readable code.
Instead of passing functions into other functions, look into whether there is a
more specialized way to solve your problem instead.
Python 54
The above lambda expression is necessary because we’re not allowed to pass the
operator around as if it were a function. If there was a function that was
equivalent to , we could pass it into the reduce function instead.
Python’s standard library actually has a whole module meant to address this
problem:
Python’s operator module exists to make various Python operators easy to use as
functions. If you’re practicing functional(ish) programming, Python’s operator
module is your friend.
Python 55
And methodcaller for calling methods on an object:
Functions in the operator module typically make code more readable than using the
equivalent lambda expressions.
Compare this:
To this:
def multiply_all(numbers):
"""Return the product of the given numbers."""
product = 1
for n in numbers:
product *= n
return product
Python 56
The second code is longer, but folks without a functional programming background
will often find it easier to understand.
Anyone who has gone through one of my Python training courses can probably
understand what that multiply_all function does, whereas that reduce / lambda
combination is likely a bit more cryptic for many Python programmers.
In general, passing one function into another function, tends to make code
more complex, which can hurt readability.
which outputs:
Python 57
This approach only works on sequences, which are data types that have indexes
from 0 to one less than their length. Sequences have three important properties:
Lists, strings, and tuples are sequences. Dictionaries, sets, and many other
iterables are not sequences.
The key takeaway here is that this looping construct, which essentially indexes the
iterable does not work on all iterables, but only on sequences.
Iterables
Python offers a fundamental abstraction called the iterable. Formally, an iterable is
any Python object capable of returning its members one at a time, permitting it to be
iterated over in a for-loop.
Like we discussed in the prior section, iterables can either be sequences or not.
Here’s an infinite iterable which provides every multiple of 5 as you loop over it:
When we were using for loops, we could have looped over the beginning of this
iterable like this:
for n in multiples_of_five:
if n > 100:
break
print(n)
If we removed the break condition from the aforementioned for loop, it would
simply go on printing forever.
So iterables can be infinitely long: which means that we can’t always convert an
iterable to a list (or any other sequence) before we loop over it. We need to
Python 58
somehow ask our iterable for each item of our iterable individually, the same way
our for loop works.
Iterators
While an iterable is anything you’re able to loop over, an iterator is the object that
does the actual iterating.
Iterators have exactly one job: return the “next” item in our iterable. They’re sort of
like tally counters, but they don’t have a reset button and instead of giving the next
number they give the next item in our iterable.
All iterables can be passed to the built-in iter function to get an iterator from them:
iterator = iter('hi')
next(iterator) # Returns "h"
next(iterator) # Returns "i"
next(iterator)
which outputs:
So iterators can be passed to the built-in next function to get the next item from
them and if there is no next item (because we reached the end), a StopIteration
Python 59
items.
There’s actually a bit more to it than that though. You can pass iterators to the built-
in iter function to get themselves back. That means that iterators are also
iterables.
iterator = iter('hi')
iterator2 = iter(iterator)
iterator is iterator2 # Returns "True"
Iterables:
Iterators:
Can be passed to the next function which gives their next item or raises
StopIteration .
The inverse of these statements should also hold true. Which means:
Anything that can be passed to next without an error (except for StopIteration)
is an iterator.
This while loop manually loops over some iterable, printing out each item as it
goes:
Python 60
def print_each(iterable):
iterator = iter(iterable)
while True:
try:
item = next(iterator)
except StopIteration:
break # Iterator exhausted: stop the loop
else:
print(item)
We can call this function with any iterable and it will loop over it:
which outputs:
The above function is essentially the same as this one which uses a for loop:
def print_each(iterable):
for item in iterable:
print(item)
This for loop is automatically doing what we were doing manually: calling iter to
get an iterator and then calling next over and over until a StopIteration exception is
raised.
The iterator protocol is used by for loops, tuple unpacking, and all built-in functions
that work on generic iterables. Using the iterator protocol (either manually or
automatically) is the only universal way to loop over any iterable in Python.
Key takeaways
Looping over iterables works via getting an iterator from an iterable and then
repeatedly asking the iterator for the next item.
The way iterators and iterables work is called the iterator protocol. List
comprehensions, tuple unpacking, for loops, and all other forms of iteration
rely on the iterator protocol.
Python 61
filled_dict = {"one": 1, "two": 2, "three": 3}
our_iterable = filled_dict.keys()
print(our_iterable) # Returns dict_keys(['one', 'two', 'three']). This is an object that i
mplements Python's iterator protocol.
# However we cannot address elements by index, since a dict is not a sequence (but is an i
terable).
our_iterable[1] # Raises a TypeError
# Our iterator is an object that can remember the state as we traverse through it.
# We get the next object with "next()".
next(our_iterator) # Returns "one"
# After the iterator has returned all of its data, it raises a StopIteration exception
next(our_iterator) # Raises StopIteration
# We can also loop over it, in fact, "for" does this implicitly!
our_iterator = iter(our_iterable)
for i in our_iterator:
print(i) # Prints one, two, three
# You can grab all the elements of an iterable or iterator by calling list() on it.
list(our_iterable) # Returns ["one", "two", "three"]
list(our_iterator) # Returns [] because state is saved
Using an iterator instead of a list, set, or another iterable data structure can
sometimes allow us to save memory. For example, we can use itertools.repeat() to
create an iterable that provides 100 million 4’s to us:
Python 62
from itertools import repeat
lots_of_fours = repeat(4, times=100_000_000)
This iterator takes up 56 bytes of memory (this number can vary depending on the
architectural specification of your machine):
import sys
sys.getsizeof(lots_of_fours) # Returns "56"
While iterators can save memory, they can also save time. For example if you
wanted to print out just the first line of a 10 gigabyte log file, you could do this:
File objects in Python are implemented as iterators. As you loop over a file, data is
read into memory one line at a time. If we instead used the readlines method to
store all lines in memory, we might run out of system memory.
So iterators can save us memory, and can sometimes save us time also.
Additionally, iterators have abilities that other iterables don’t. For example, the
laziness of iterables can be used to make iterables that have an unknown length. In
fact, you can even make infinitely long iterators.
For example, the itertools.count() utility will give us an iterator that will provide
every number from 0 upward as we loop over it:
Python 63
which outputs:
Let’s make our own iterators. We’ll start be re-inventing the itertools.count()
iterator object.
class Count:
"""Iterator that counts upward forever."""
def __iter__(self):
return self
def __next__(self):
num = self.num
self.num += 1
return num
This class has an initializer that initializes our current number to 0 (or whatever is
passed in as the start). The things that make this class usable as an iterator are the
__iter__ and __next__ methods.
When an object is passed to the str built-in function, its __str__ method is called.
When an object is passed to the len built-in function, its __len__ method is called.
numbers = [1, 2, 3]
str(numbers), numbers.__str__() # Returns "('[1, 2, 3]', '[1, 2, 3]')"
len(numbers), numbers.__len__() # Returns "(3, 3)""
Calling the built-in iter function on an object will attempt to call its __iter__
method. Calling the built-in next function on an object will attempt to call its __next__
Python 64
method.
The iter function is supposed to return an iterator. So our __iter__ function must
return an iterator. But our object is an iterator, so should return ourself. Therefore
our Count object returns self from its __iter__ method because it is its own iterator.
The next function is supposed to return the next item in our iterator or raise a
StopIteration exception when there are no more items. We’re returning the current
number and incrementing the number so it’ll be larger during the next __next__ call.
We can manually loop over our Count iterator class like this:
We could also loop over our Count object using a for loop, as with any other
iterable:
which outputs:
This object-oriented approach to making an iterator is cool, but it’s not the usual
way that Python programmers make iterators. Usually when we want an iterator, we
create a generator, which brings us to the topic of our next section.
Generators
Generators are an easy way to make iterators.
What separates generators from typical iterators is that fact that they offer lazy (on
demand) generation of values, which translates to lower memory usage.
Furthermore, we do not need to wait until all the elements have been generated
before we start to use them, which yields a performance improvement.
Note that a generator will provide performance benefits only if we do not intend to
use the set of generated values more than once.
Python 65
sum_of_first_n = sum(firstn(1000000))
The code is quite simple and straightforward, but it builds the full list in memory.
This is clearly not acceptable in our case, because we cannot afford to keep all “10
megabyte” integers in memory.
Let’s switch over to generators to figure out how they help solve the aforementioned
problem. Generators are memory-efficient because they only load the data needed
to process the next value in the iterable. This allows them to perform operations on
otherwise prohibitively large value ranges. The following implements generator as
an iterable object:
class firstn(object):
def __init__(self, n):
self.n = n
self.num = 0
def __iter__(self):
return self
# Python 3 compatibility
def __next__(self):
return self.next()
def next(self):
if self.num < self.n:
cur, self.num = self.num, self.num+1
return cur
else:
raise StopIteration()
sum_of_first_n = sum(firstn(1000000))
Furthermore, this is a pattern that we will use over and over for many similar
constructs. Imagine writing all that just to get an iterator!
Python 66
This leads us to the two ways to easily create generators in Python: generator
functions and generator expressions.
Generator Functions
Generator functions differ from plain old functions based on the fact that they have
one or more yield statements.
result = firstn(1000000)
result # Returns "<generator object firstn at some address>"
type(result) # Returns "<class 'generator'>"
next(result) # Returns 0
next(result) # Returns 1
next(result) # Returns 2
Alternatively, note that calling iter on the generator function also yields a
generator object:
result = iter(firstn(1000000))
result # Returns "<generator object firstn at some address>"
Like we discussed earlier, note that the mere presence of a yield statement turns a
function into a generator function.
Performing an operation such as sum() that requires all elements of the iterator to
be available leads to static generation of the list, similar to a regular iterator:
sum_of_first_n = sum(firstn(1000000))
Python 67
You can use a for loop over this generator which automatically calls on next() to
loop through elements:
Note that this function is considerably shorter (with much less boilerplate code) than
the firstn class we created in the previous section.
We can make a generator that will lazily provide us with all the squares of these
numbers like this:
def square_all(numbers):
for n in numbers:
yield n**2
squares = square_all(favorite_numbers)
Generator Expressions
Similar to list comprehensions, you can create generator comprehensions as well,
which are more commonly known as generator expressions. In other words,
generator expressions offer list comprehension-like syntax that allows us to create
generators.
for x in values:
print(x)
# list comprehension
doubles = [2 * n for n in range(50)]
Python 68
As another example, here’s a generator expression that filters empty lines from a
file and strips newlines from the end:
We can make a generator that will lazily provide us with all the squares of these
numbers like this:
First, let’s talk about terminology. The word “generator” is used in quite a few ways
in Python:
Python 69
Second, you can also copy-paste your way from a generator function to a function
that returns a generator expression:
def get_a_generator(some_iterable):
for item in some_iterable:
if some_condition(item):
yield item
def get_a_generator(some_iterable):
return (item for item in some_iterable if some_condition(item))
If you can’t write your generator function in that form, then you can’t create a
generator expression to replace it.
Decorators
Decorators are functions that wrap around other functions.
In the below example, beg wraps say . If say_please is True then it will change the
returned message.
def beg(target_function):
def wrapper(*args, **kwargs):
msg, say_please = target_function(*args, **kwargs)
Python 70
if say_please:
return "{} {}".format(msg, "Please! I am poor :(")
return msg
return wrapper
@beg
def say(say_please=False):
msg = "Can you buy me a beer?"
return msg, say_please
Another example:
def my_simple_logging_decorator(func):
def you_will_never_see_this_name(*args, **kwargs):
print('Calling {}'.format(func.__name__))
return func(*args, **kwargs)
return you_will_never_see_this_name
@my_simple_logging_decorator
def double(x):
'Doubles a number.'
return 2 * x
def makebold(fn):
def wrapped(*args, **kwargs):
return "<b>" + fn(*args, **kwargs) + "</b>"
return wrapped
def makeitalic(fn):
def wrapped(*args, **kwargs):
return "<i>" + fn(*args, **kwargs) + "</i>"
return wrapped
@makebold
@makeitalic
def hello():
return "hello world"
@makebold
Python 71
@makeitalic
def log(s):
return s
File I/O
In this section, you’ll learn about Python file operations. More specifically, opening a
file, reading from it, writing into it, closing it, and various file methods that you
should be aware of.
Files
Files are named locations on disk to store related information. They are used to
permanently store data in a non-volatile memory (e.g. hard disk).
Since Random Access Memory (RAM) is volatile (which loses its data when the
computer is turned off), we use files for future use of the data by permanently
storing them.
When we want to read from or write to a file, we need to open it first. When we are
done, it needs to be closed so that the resources that are tied with the file are freed.
1. Open a file
Python 72
We can specify the mode while opening a file. In mode, we specify whether we want
to read r, write w or append a to the file. We can also specify if we want to open the
file in text mode or binary mode.
The default is reading in text mode. In this mode, we get strings when reading from
the file.
On the other hand, binary mode returns bytes and this is the mode to be used when
dealing with non-text files like images or executable files.
Mode Description
r Opens a file for reading. (default)
Opens a file for writing. Creates a new file if it does not exist or truncates the
w
file if it exists.
x Opens a file for exclusive creation. If the file already exists, the operation fails.
Opens a file for appending at the end of the file without truncating it. Creates a
a
new file if it does not exist.
t Opens in text mode. (default)
b Opens in binary mode.
+ Opens a file for updating (reading and writing)
As an example:
Unlike other languages, the character a does not imply the number 97 until it is
encoded using ASCII (or other equivalent encodings).
So, we must not also rely on the default encoding or else our code will behave
differently in different platforms.
Hence, when working with files in text mode, it is highly recommended to specify the
encoding type.
Python 73
f = open("test.txt", mode='r', encoding='utf-8')
Closing a file will free up the resources that were tied with the file. It is done using
the close() method available in Python.
Python has a garbage collector to clean up unreferenced objects but we must not
rely on it to close the file.
This method is not entirely safe. If an exception occurs when we are performing
some operation with the file, the code exits without closing the file.
try:
f = open("test.txt", encoding = 'utf-8')
# perform file operations
finally:
f.close()
This way, we are guaranteeing that the file is properly closed even if an exception is
raised that causes program flow to stop.
The best way to close a file is by using the with statement. This ensures that the
file is closed when the block inside the with statement is exited.
Python 74
Writing to Files in Python
In order to write into a file in Python, we need to open it in write w , append a or
exclusive creation x mode.
We need to be careful with the w mode, as it will overwrite into the file if it already
exists. Due to this, all the previous data are erased.
Writing a string or sequence of bytes (for binary files) is done using the write()
method. This method returns the number of characters written to the file.
This program will create a new file named test.txt in the current directory if it does
not exist. If it does exist, it is overwritten.
We must include the newline characters ourselves to distinguish the different lines.
Note that all these reading methods return empty values when the end of file (EOF)
is reached.
read()
We can use the read(size) method to read in the size number of data. If the size
parameter is not specified, it reads and returns up to the end of the file.
We can read the text.txt file we wrote in the above section in the following way:
Python 75
# Read in the rest till end of file
f.read() # Returns 'my first file\nThis file\ncontains three lines\n'
We can see that the read() method returns a newline as '\n' . Once the end of the
file is reached, we get an empty string on further reading.
We can change our current file cursor (position) using the seek() method. Similarly,
the tell() method returns our current position (in number of bytes).
# Outputs:
# This is my first file
# This file
# contains three lines
for Loop
We can read a file line-by-line using a for loop. This is both efficient and fast. Note
that with this setup, the lines in the file itself include a newline character \n . So, we
use the end parameter of the print() function to avoid two newlines when printing.
for line in f:
print(line, end = '')
# Outputs:
# This is my first file
# This file
# contains three lines
Note that the above is a common method for “lazy” reading of big files in Python,
especially when reading large files on a system with limited memory.
Python 76
for line in open('really_big_file.dat'):
process_data(line)
with open('really_big_file.dat') as f:
for piece in read_in_chunks(f):
process_data(piece)
f = open('really_big_file.dat')
def read1k():
return f.read(1024)
readline()
We can use the readline() method to read individual lines of a file. This method
reads a file till the newline, including the newline character.
readlines()
Python 77
readlines() returns a list of remaining lines of the entire file.
f.readlines() # Returns ['This is my first file\n', 'This file\n', 'contains three lines
\n']
Here is the complete list of methods in text mode with a brief description:
Method Description
close() Closes an opened file. It has no effect if the file is already closed.
Reads at most n characters from the file. Reads till end of file if it is
read(n)
negative or None .
Reads and returns one line from the file. Reads in at most n bytes if
readline(n=-1)
specified.
Reads and returns a list of lines from the file. Reads in at most n
readlines(n=-1)
bytes/characters if specified.
Resizes the file stream to size bytes. If size is not specified, resizes
truncate(size= None )
to current location.
Writes the string s to the file and returns the number of characters
write(s)
written.
Python 78
writelines(lines) Writes a list of lines to the file.
Magic Methods
Magic methods are special methods that you can define to add “magic” to your
classes. They’re always surrounded by double leading and trailing underscores
(e.g. __init__() or __lt__() ).
Actually, it’s a method called __new__() , which actually creates the instance, then
passes any arguments at creation on to the initializer. At the other end of the
object’s lifespan, there’s __del__() . Let’s take a closer look at these 3 magic
methods:
__new__() is the first method to get called in an object’s instantiation. It takes the
class, then any other arguments that it will pass along to __init__() . __new__()
is used fairly rarely, but it does have its purposes, particularly when subclassing
an immutable type like a tuple or a string. I don’t want to go in to too much detail
on __new__() because it’s not too useful, but it is covered in great detail in the
Python docs.
__init__() is the initializer for the class. It gets passed whatever the primary
constructor was called with (so, for example, if we called x = SomeClass(10,
x (so that code would not translate to x.__del__() ). Rather, it defines behavior
for when an object is garbage collected. It can be quite useful for objects that
might require extra cleanup upon deletion, like sockets or file objects. Be
careful, however, as there is no guarantee that __del__() will be executed if the
Python 79
object is still alive when the interpreter exits, so __del__() can’t serve as a
replacement for good coding practices (like always closing a connection when
you’re done with it. In fact, __del__() should almost never be used because of
the precarious circumstances under which it is called; use it with caution!
__cmp__() should return a negative integer if self < other, zero if self == other, and
positive if self > other. It’s usually best to define each comparison you need rather
than define them all at once, but __cmp__() can be a good way to save repetition
and improve clarity when you need all comparisons implemented with similar
criteria.
For organization’s sake, we’ve split the numeric magic methods into 5 categories:
unary operators, normal arithmetic operators, reflected arithmetic operators (more
on this later), augmented assignment, and type conversions.
Python 80
Unary operators and functions only have one operand, e.g. negation, absolute
value, etc.
But before we get down to the good stuff, a quick word on requirements.
Requirements
Now that we’re talking about creating your own sequences in Python, it’s time to talk
about protocols. Protocols are somewhat similar to interfaces in other languages in
that they give you a set of methods you must define. However, in Python protocols
are totally informal and require no explicit declarations to implement. Rather, they’re
more like guidelines.
Why are we talking about protocols now? Because implementing custom container
types in Python involves using some of these protocols. First, there’s the protocol
Python 81
for defining immutable containers: to make an immutable container, you need only
define __len__() and __getitem__() (more on these later). The mutable container
protocol requires everything that immutable containers require plus __setitem__()
and __delitem__() . Lastly, if you want your object to be iterable, you’ll have to define
__iter__() , which returns an iterator. That iterator must conform to an iterator
protocol, which requires iterators to have methods called __iter__() (returning
itself) and next.
__len__() : Returns the length of the container. Part of the protocol for both
immutable and mutable containers.
__delitem__() : Defines behavior for when an item is deleted (e.g. del self[key] ).
This is only part of the mutable container protocol. You must raise the
appropriate exceptions when an invalid key is used.
__iter__() : Should return an iterator for the container. Iterators are returned in a
number of contexts, most notably by the iter() built in function and when a
container is looped over using the form for x in container: . Iterators are their
own objects, and they also must define an __iter__() method that returns self.
Example
For our example, let’s look at a list that implements some functional constructs:
class FunctionalList:
'''A class wrapping a list with some extra functional magic, like head,
tail, init, last, drop, and take.'''
Python 82
def __init__(self, values=None):
if values is None:
self.values = []
else:
self.values = values
def __len__(self):
return len(self.values)
def __iter__(self):
return iter(self.values)
if instance.equals(other_instance):
# do something
You could certainly do this in Python, too, but this adds confusion and is
unnecessarily verbose. Different libraries might use different names for the same
operations, making the client do way more work than necessary. With the power of
magic methods, however, we can define one method ( __eq__() , in this case), and
say what we mean instead:
if instance == other_instance:
# do something
Python 83
That’s part of the power of magic methods. The vast majority of them allow us to
define meaning for operators so that we can use them on our own classes just like
they were built in types.
Python 84
with statement context manager
__exit__(self, exc, val,
trace)
with self as x: with statement context manager
Exceptions
An exception is an illegal operation that occurs during the execution of a program.
Exceptions are known to non-programmers as instances that do not conform to a
general rule.
The name “exception” in computer science has this meaning as well – it implies that
the problem (the exception) doesn’t occur frequently, i.e., the exception is the
“exception to the rule”.
Exception Handling
Exception handling is the process of responding to the occurrence of exceptions –-
anomalous or exceptional conditions requiring special processing – during the
execution of a program. Since exception handling ensures that the flow of the
program doesn’t break when an exception occurs, it fosters robust code.
Error handling is generally resolved by saving the state of execution at the moment
the error occurred and interrupting the normal flow of the program to execute a
special function or piece of code, which is known as the “exception handler”.
Depending on the kind of error (“division by zero”, “file open error”, etc.) which has
occurred, the error handler can “fix” the problem and the program can be continued
afterwards with the previously saved data.
Terminology:
The code, which harbors the risk of an exception, is embedded within a try
block.
Let’s look at a simple example. Assume that we want to ask the user to enter an
integer. If we use a input() , the input will be a string, which will need to be cast into
Python 85
an integer. If the input isn’t a valid integer, we will generate (raise) a ValueError .
which outputs:
With the aid of exception handling, we can write robust code for reading an integer
from input:
while True:
try:
n = input("Please enter an integer: ")
n = int(n)
break
except ValueError:
print("No valid integer! Please try again ...")
print("Great, you successfully entered an integer!")
which outputs:
It’s a loop, which breaks only if a valid integer has been given. The while loop is
entered. The code within the try clause will be executed statement by statement. If
no exception occurs during the execution, the execution will reach the break
statement and the while loop will be left.
Python 86
If an exception occurs, i.e., in the casting of n , the rest of the try block will be
skipped and the except clause will be executed. The raised error, in this particular
case a ValueError , has to match one of the names after except. After having printed
the text of the print statement, the execution does another loop. It starts with a
new input() .
Our next example shows a try clause, in which we open a file for reading, read a
line from this file and convert this line into an integer. There are at least two possible
exceptions:
Just in case we have an additional unnamed except clause for an unexpected error:
import sys
try:
f = open('integers.txt')
s = f.readline()
i = int(s.strip())
except IOError as e:
errno, strerror = e.args
print("I/O error({0}): {1}".format(errno,strerror))
# e can be printed directly without using .args:
# print(e)
except ValueError:
print("No valid integer in line.")
except:
print("Unexpected error:", sys.exc_info()[0])
raise
which outputs:
The handling of the IOError in the previous example is of special interest. The
except clause for the IOError specifies a variable e after the exception name
( IOError ).
Python 87
The variable e is bound to an exception instance with the arguments stored in
instance.args .
If we call the above script with a non-existing file, we get the message:
And if the file integers.txt is not readable, say if we don’t have the permission to
read it, we get the following message:
An except clause may name more than one exception in a tuple of error names, as
we see in the following example:
try:
f = open('integers.txt')
s = f.readline()
i = int(s.strip())
except (IOError, ValueError):
print("An I/O error or a ValueError occurred")
except:
print("An unexpected error occurred")
raise
which outputs:
Here’s what happens if we call a function within a try block and if an exception
occurs inside the function call:
def f():
x = int("four")
try:
f()
except ValueError as e:
print("got it :-) ", e)
Python 88
print("Let's get on")
which outputs:
got it :-) invalid literal for int() with base 10: 'four'
Let's get on
We modify our example so that the function catches the exception directly:
def f():
try:
x = int("four")
except ValueError as e:
print("got it in the function :-) ", e)
try:
f()
except ValueError as e:
print("got it :-) ", e)
which outputs:
got it in the function :-) invalid literal for int() with base 10: 'four'
Let's get on
As expected, the exception is caught inside the function and not in the callers
exception.
We now add a raise , which generates the ValueError again, so that the exception
will be propagated to the caller:
def f():
try:
x = int("four")
except ValueError as e:
print("got it in the function :-) ", e)
Python 89
raise
try:
f()
except ValueError as e:
print("got it :-) ", e)
which outputs:
got it in the function :-) invalid literal for intT() with base 10: 'four'
got it :-) invalid literal for int() with base 10: 'four'
Let's get on
Custom Exceptions
It’s possible to create custom exceptions using the raise statement which forces a
specified exception to occur:
which outputs:
The Pythonic way to do this is to define an exception class which inherits from the
Exception class:
which outputs:
Python 90
MyException Traceback (most recent call last)
<ipython-input-3-d75bff75fe3a> in <module>
2 pass
3
----> 4 raise MyException("An exception doesn't always prove the rule!")
clause.
finally clauses are called clean-up or termination clauses, because they must be
executed under all circumstances, i.e., a finally clause is always executed
regardless if an exception occurred in a try block or not. A simple example to
demonstrate the finally clause:
try:
x = float(input("Your number: "))
inverse = 1.0 / x
finally:
print("There may or may not have been an exception.")
print("The inverse: ", inverse)
which outputs:
Your number: 34
There may or may not have been an exception.
The inverse: 0.029411764705882353
finally and except can be used together for the same try block:
try:
x = float(input("Your number: "))
inverse = 1.0 / x
except ValueError:
print("You should have given either an int or a float")
except ZeroDivisionError:
Python 91
print("Infinity")
finally:
print("There may or may not have been an exception.")
try:
# Use "raise" to raise an error
raise IndexError("This is an index error")
except IndexError as e:
pass # Pass is just a no-op. Usually you would do recovery here.
except (TypeError, NameError):
pass # Multiple exceptions can be handled together, if required.
else: # Optional clause to the try/except block. Must follow all except b
locks
print("All good!") # Runs only if the code in try raises no exceptions
finally: # Execute under all circumstances
print("We can clean up resources here")
which outputs:
Your number: 23
There may or may not have been an exception.
else Clause
The try ... except statement has an optional else clause. An else block has to be
positioned after all the except clauses. An else clause will be executed if the try
The following example opens a file and reads in all the lines into a list called “text”:
import sys
file_name = sys.argv[1]
text = []
try:
fh = open(file_name, 'r')
text = fh.readlines()
fh.close()
except IOError:
print('cannot open', file_name)
Python 92
if text:
print(text[100])
which outputs:
This example receives the file name via a command line argument. So make sure
that you call it properly: Let’s assume that you saved this program as
“exception_test.py”. In this case, you have to call it with:
If you don’t want this behavior, just change the line file_name = sys.argv[1] to
file_name = 'integers.txt' .
import sys
file_name = sys.argv[1]
text = []
try:
fh = open(file_name, 'r')
except IOError:
print('cannot open', file_name)
else:
text = fh.readlines()
fh.close()
if text:
print(text[100])
which outputs:
The main difference is that in the first case, all statements of the try block can lead
to the same error message “cannot open …”, which is wrong, if fh.close() or
fh.readlines() raise an error.
with Statement
Instead of try / finally to cleanup resources, you can use simply use a with
context-manager:
Python 93
with open("myfile.txt") as f:
for line in f:
print(line)
# Writing to a file
contents = {"aa": 12, "bb": 21}
with open("myfile1.txt", "w+") as file:
file.write(str(contents)) # Writes a string to a file
try:
something()
except SomeError as e:
try:
plan_B()
except AlsoFailsError:
raise e # or raise e from None - see below
The traceback produced will include an additional notice that SomeError occurred
while handling AlsoFailsError (because of raise e being inside except
AlsoFailsError ). This is misleading because what actually happened is the other
way around - we encountered AlsoFailsError , and handled it, while trying to recover
from SomeError . To obtain a traceback that doesn’t include AlsoFailsError , replace
raise e with raise e from None .
Python 94
In Python 2 you’d store the exception type, value, and traceback in local variables
and use the three-argument form of raise:
try:
something()
except SomeError:
t, v, tb = sys.exc_info()
try:
plan_B()
except AlsoFailsError:
raise t, v, tb
Built-in Exceptions
Some of the common built-in exceptions in Python programming along with the
error that cause them are listed below. An exhaustive list of built-in exceptions in
Python can be found in the Python documentation.
Python 95
referent.
RuntimeError Raised when an error does not fall under any other category.
We can view all the built-in exceptions using the built-in local() function as follows:
print(dir(locals()['__builtins__']))
Modules
You can import modules:
import math
print(math.sqrt(16)) # Returns 4.0
Python 96
print(floor(3.7)) # Returns 3.0
import math as m
math.sqrt(16) == m.sqrt(16) # Returns True
Python modules are just ordinary Python files. You can write your own, and import
them. The name of the module is the same as the name of the file.
You can also find out which functions and attributes are defined in a module using
dir , which we covered in detail in its section.
import math
dir(math)
A gotcha with Python’s module imports is that if you have a Python script named
math.py in the same folder as your current script, the file math.py will be loaded
instead of the built-in Python module. This happens because the local folder has
priority over Python’s built-in libraries.
Python 97
Owing to multiple modules within a package (that give a package hierarchy), a
package can be hierarchically imported as:
Thus, regular modules in Python are just “files”, while packages are “directories”.
The distinction between module and package seems to hold just at the file system
level. When you import a module or a package, the corresponding object created by
Python is always of type module . Note, however, when you import a package, only
variables/functions/classes in the __init__.py file of that package are directly visible,
not sub-packages or modules. As an example, consider the xml package in the
Python standard library: its xml directory contains an __init__.py file and four sub-
directories; the sub-directory etree contains an __init__.py file and, among others,
an ElementTree.py file. See what happens when you try to interactively import
package/modules:
Regular modules can be “imported” and can be “executed” (as shown in the
examples above), package modules also can be “imported” and can be “executed”,
however, you may rightly complain: “but we can’t directly write code in directories!
Code is written in files only!”, and that’s indeed a very good complaint, as it leads us
Python 98
to the second special thing about package modules. The code for a package
module is written in files inside its directory, and the names of these files are also
reserved by Python. If you want to “import” a package module, you’ll have to put its
code in an __init__.py file in its directory, and if you want to “execute” a package
module, you’ll have to put the execution code of it in a __main__.py file in its
directory.
As an example:
# bar_pack/__init__.py
def talk():
print("bar")
# bar_pack/__main__.py
import __init__
__init__.talk()
# foo.py
Classes
Python 99
The syntax for defining classes in Python is straightforward:
# Constructor
# Note that all methods of a class take "self" as the first argument
def __init__(self, name):
# Assign the argument to the instance's name attribute
self.name = name
# Initialize property
self._age = 0
# Instance method
def say(self, msg):
print("{name}: {message}".format(name=self.name, message=msg))
Python 100
# When a Python interpreter reads a source file it executes all its code.
# This __name__ check makes sure this code block is only executed when this
# module is the main program.
if __name__ == '__main__':
# Instantiate the class
i = Human(name="Ian")
i.say("hi") # Call an instance method; prints "Ian: hi"
i.greet() # Call an instance method; prints "Hello, Ian"
Inheritance
Inheritance allows new child classes to be defined that inherit methods and
variables from their parent class.
Using the Human class defined above as the base or parent class, we can define a
child class, Superhero, which inherits the class variables like species , name , and
Python 101
age , as well as methods, like greet() and grunt() from the Human class, but can
also have its own unique properties.
To take advantage of modularization by file you could place the classes above in
their own files, say, human.py .
To import functions from other files use the following format from "filename-without-
# If the child class should inherit all of the parent's definitions without
# any modifications, you can just use the "pass" keyword (and nothing else)
# but in this case it is commented out to allow for a unique child class:
# pass
# The "super" function lets you access the parent class's methods
# that are overridden by the child, in this case, the __init__ method.
# This calls the parent class constructor:
super().__init__(name)
Python 102
def boast(self):
for power in self.superpowers:
print("I wield the power of {pow}!".format(pow=power))
if __name__ == '__main__':
sup = Superhero(name="Flash")
# Get the Method Resolution search Order used by both getattr() and super()
# This attribute is dynamic and can be updated
print(Superhero.__mro__) # Prints "(<class '__main__.Superhero'>,
# <class 'human.Human'>, <class 'object'>)"
Multiple Inheritance
Multiple inheritance can be best explained with an example.
# bat.py
class Bat:
species = 'Baty'
Python 103
# This class also has a say method
def say(self, msg):
msg = '... ... ...'
return msg
if __name__ == '__main__':
b = Bat()
print(b.say('hello'))
print(b.fly)
And another class Batman that inherits from both Superhero and Bat :
# superhero.py
from superhero import Superhero
from bat import Bat
# Define Batman as a child that inherits from both Superhero and Bat
class Batman(Superhero, Bat):
def __init__(self, *args, **kwargs):
# Typically to inherit attributes you have to call super:
# super(Batman, self).__init__(*args, **kwargs)
# However we are dealing with multiple inheritance here, and super()
# only works with the next base class in the MRO list.
# So instead we explicitly call __init__ for all ancestors.
# The use of *args and **kwargs allows for a clean way to pass arguments,
# with each parent "peeling a layer of the onion".
Superhero.__init__(self, 'anonymous', movie=True,
superpowers=['Wealthy'], *args, **kwargs)
Bat.__init__(self, *args, can_fly=False, **kwargs)
# override the value for the name attribute
self.name = 'Sad Affleck'
if __name__ == '__main__':
sup = Batman()
# Get the Method Resolution search Order used by both getattr() and super().
# This attribute is dynamic and can be updated
print(Batman.__mro__) # Prints "(<class '__main__.Batman'>,
# <class 'superhero.Superhero'>,
# <class 'human.Human'>,
Python 104
# <class 'bat.Bat'>, <class 'object'>)"
# Inherited attribute from 2nd ancestor whose default value was overridden.
print('Can I fly? ' + str(sup.fly)) # Returns Can I fly? False
Selected Built-ins
In this section, we present a selected set of commonly used built-in functions.
For an exhaustive list of Python’s built-in functions, refer the Python documentation.
any / all
Python’s any and all functions can be interpreted as a series of logical or and
and operators, respectively.
any returns True if any element of the iterable is true. If the iterable is empty, return
False .
returns
all True if all elements of the iterable are true. If the iterable is empty, still
return True .
any([0, 0.0, False, (), '0']), all([1, 0.0001, True, (False,)]) # Returns (True, True)
If the iterables are empty, any returns False , and all returns True :
Python 105
Using the concepts we’ve seen so far,
In summary,
any all
dir
dir() is a powerful inbuilt function, which returns list of the attributes and methods
of any object (say functions , modules, strings, lists, dictionaries etc.)
# dir() will return all the attributes of the "arr" list object
dir(a) # Returns ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
...]
Python 106
When a user defined class object with an overridden __dir__() method is passed in
as a parameter, dir() returns a list of the attributes contained in that object:
my_cart = Supermarket()
If no parameters are passed, it returns a list of names in the current local scope:
# dir() now returns the imported modules added to the local namespace including all the ex
isting ones as before
dir() # Returns ['__builtins__', '__cached__', '__doc__', '__file__', ..., 'math', 'rando
m']
The most common use-case of dir() is for debugging since it helps list out all the
attributes of the input parameter. This is especially useful when especially dealing a
lot of classes and functions in a large project.
dir()also offers information about the operations we can perform with the
parameter that was passed in (class object, module, etc.), which can come in handy
when you have little to no information about the parameter.
eval
filter
filter() constructs an iterator from those elements of the input iterable for which
the input function returns True .
Python 107
# Function that filters vowels
def filterVowels(variable):
letters = ['a', 'e', 'i', 'o', 'u']
if (variable in letters):
return True
else:
return False
isinstance
You can even pass a tuple of classes to be checked for the object:
Python 108
isinstance() also works for objects of custom-defined classes:
# Parent class
class Vehicles:
def __init__(self):
return
# Child class
class Car(Vehicles):
# Constructor
def __init__(self):
Vehicles.__init__('Car')
# initializing objects
v = Vehicles()
c = Car()
A gotcha with isinstance() is that the bool datatype is a subclass of the int
datatype:
issubclass
# Parent class
class Vehicles:
def __init__(self):
return
# Child class
class Car(Vehicles):
# Constructor
def __init__(self):
Vehicles.__init__('Car')
# Driver's code
issubclass(Car, Vehicles) # Returns True
issubclass(Car, list) # Returns False
Python 109
Similar to isinstance() , you can pass a tuple of classes to be checked for the class:
Again, similar to isinstance() , a gotcha with issubclass() is that the bool datatype
is a subclass of the int datatype:
iter
# list of vowels
vowels = ['a', 'e', 'i', 'o', 'u']
vowels_iter = iter(vowels)
Python 110
class PrintNumber:
def __init__(self, max):
self.max = max
def __iter__(self):
self.num = 0
return self
def __next__(self):
if(self.num >= self.max):
raise StopIteration
self.num += 1
return self.num
print_num = PrintNumber(3)
print_num_iter = iter(print_num)
next(print_num_iter) # Returns '1'
next(print_num_iter) # Returns '2'
next(print_num_iter) # Returns '3'
# raises StopIteration
next(print_num_iter)
When you run the program, it will open the mydata.txt file in reading mode.
Then, the iter(fp.readline, '') in the for loop calls readline (which reads each
line in the text file) until the sentinel character, '' (empty string), is reached.
len
Internally, len() calls the object’s __len__() function. You can think of len() as:
testList = []
len(testList) # Returns 0
testList = [1, 2, 3]
len(testList) # Returns 3
Python 111
testTuple = (1, 2, 3)
len(testList) # Returns 3
testString = ''
len(testString) # Returns 0
testString = 'Python'
len(testString) # Returns 6
# Byte object
testByte = b'Python'
len(testByte) # Returns 6
testSet = {1, 2, 3}
len(testSet) # Returns 3
# Empty Set
testSet = set()
len(testSet) # Returns 0
testDict = {}
len(testDict) # Returns 0
# frozenSet
testSet = {1, 2}
frozenTestSet = frozenset(testSet)
len(frozenTestSet) # Returns 2
Python 112
class Session:
def __init__(self, number=0):
self.number = number
def __len__(self):
return self.number
s1 = Session()
len(s1) # Returns 0 since the default length is 0
range
# Initializing list using range, 2 parameters only step and stop parameters
list(range(3, 6)) # Returns [3, 4, 5]
# Initializing list using range, 2 parameter only step and stop parameters
list(range(-6, 2)) # Returns [-6, -5, -4, -3, -2, -1, 0, 1]
reversed
Python 113
# A list of numbers
L1 = [1, 2, 3, 4, 1, 2, 6]
L1.reverse()
print(L1) # Prints [6, 2, 1, 4, 3, 2, 1]
# A list of characters
L2 = ['a', 'b', 'c', 'd', 'a', 'a']
L2.reverse()
print(L2) # Prints ['a', 'a', 'd', 'c', 'b', 'a']
Note that reversed() only supports lists. Datatypes other than a list return an
AttributeError :
string = "abgedge"
string.reverse()
print(string) # Returns AttributeError: 'str' object has no attribute 'reverse'
sort
Basics
sort() returns a new sorted list from the items in iterable.
To perform a simple ascending sort, just call the sorted() function without any input
arguments. It returns a new sorted list:
You can also use the <list>.sort() function of a list. It modifies the list in-place
(and returns None to avoid confusion). Usually it’s less convenient than sorted() ,
but if you don’t need the original list, it’s slightly more efficient.
a = [5, 2, 3, 1, 4]
a.sort()
a # Returns [1, 2, 3, 4, 5]
Another difference is that list.sort() is only defined for lists. In contrast, the
sorted() function accepts any iterable.
Python 114
sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) # Returns [1, 2, 3, 4, 5]
Sort Key
Both list.sort() and sorted() added a key parameter to specify a function to be
called on each list element prior to making comparisons.
The value of the key parameter should be a function that takes a single argument
and returns a key to use for sorting purposes. This technique is fast because the
key function is called exactly once for each input record.
A common pattern is to sort complex objects using some of the object’s indices as a
key. You can use a lambda function to access the relevant index of the object’s
attributes. Since these access patterns are very common, Python provides
convenience functions to make accessor functions easier and faster. The operator
module offers itemgetter , which serves this role. For example:
class Student:
def __init__(self, name, grade, age):
self.name = name
self.grade = grade
self.age = age
def __repr__(self):
return repr((self.name, self.grade, self.age))
def weighted_grade(self):
return 'CBA'.index(self.grade) / float(self.age)
student_tuples = [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
# Sort by age
sorted(student_tuples, key=lambda student: student[2]) # Returns [('dave', 'B', 10), ('jan
e', 'B', 12), ('john', 'A', 15)]
# Sort by age
sorted(student_tuples, key=itemgetter(2)) # Returns [('dave', 'B', 10), ('jan
e', 'B', 12), ('john', 'A', 15)]
Python 115
The same technique works for objects with named attributes. Similar to the above
case, you can use a lambda function to access the named attributes within the
object. Alternatively, you can use attrgetter , which is offered by the operator
module. For example:
class Student:
def __init__(self, name, grade, age):
self.name = name
self.grade = grade
self.age = age
def __repr__(self):
return repr((self.name, self.grade, self.age))
def weighted_grade(self):
return 'CBA'.index(self.grade) / float(self.age)
student_tuples = [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
student_objects = [Student('john', 'A', 15), Student('jane', 'B', 12), Student('dave',
'B', 10)]
# Sort by age
sorted(student_objects, key=lambda student: student.age) # Returns [('dave', 'B', 10), ('j
ane', 'B', 12), ('john', 'A', 15)]
Note that compared to lambda functions, itemgetter offers succinct syntax, for
example, if you need to get a number of elements at once. For instance,
Both list.sort() and sorted() accept a reverse parameter with a boolean value,
which can serve as a flag for descending sorts. For example, to get the student data
in reverse age order:
class Student:
def __init__(self, name, grade, age):
self.name = name
self.grade = grade
Python 116
self.age = age
def __repr__(self):
return repr((self.name, self.grade, self.age))
def weighted_grade(self):
return 'CBA'.index(self.grade) / float(self.age)
student_tuples = [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
student_objects = [Student('john', 'A', 15), Student('jane', 'B', 12), Student('dave',
'B', 10)]
For sorting examples and a brief sorting tutorial, see Python’s Sorting HowTo wiki.
zip
zip() aggregates elements from each of its inputs. It returns a zip object, which is
an iterator of tuples where the tuple contains the element from each of the input
iterables.
As a simple example,
a = [1, 2]
b = [4, 5]
x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
print(list(zipped)) # Prints [(1, 4), (2, 5), (3, 6)]
Python 117
x2, y2 = zip(*zip(x, y))
x == list(x2) and y == list(y2) # Returns True
zip() can work with iterables of different lengths, in which case the iterator with the
least items decides the length of its output. In other words, zip() stops iterating
when the shortest-length input iterable is exhausted.
zip()should only be used with unequal length inputs when you don’t care
about trailing, unmatched values from the longer iterables. If those values are
important, use itertools.zip_longest() instead.
To read more about the zip() function, refer Python’s built-in functions
documentation.
Python 118