Quantecon Python Programming
Quantecon Python Programming
and Finance
I Introduction to Python 3
1 About These Lectures 5
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 What’s Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Scientific Programming with Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Getting Started 17
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Python in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Local Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Jupyter Notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Installing Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Working with Python Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 An Introductory Example 37
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 The Task: Plotting a White Noise Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Alternative Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Another Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Functions 55
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Function Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Defining Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5 Recursive Function Calls (Advanced) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7 Advanced Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5 Python Essentials 69
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Iterating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 Comparisons and Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Coding Style and Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
i
6 OOP I: Objects and Methods 89
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3 Inspection Using Rich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.4 A Little Mystery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11 NumPy 161
11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
11.2 NumPy Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
11.3 Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
11.4 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11.5 Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.6 Mutability and Copying Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
11.7 Additional Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
11.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
12 Matplotlib 187
ii
12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
12.2 The APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
12.3 More Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
12.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
12.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
13 SciPy 205
13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
13.2 SciPy versus NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
13.3 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
13.4 Roots and Fixed Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
13.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
13.6 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
13.7 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
13.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
14 Pandas 219
14.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
14.2 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
14.3 DataFrames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
14.4 On-Line Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
16 SymPy 273
16.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
16.2 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
16.3 Symbolic algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
16.4 Symbolic Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
16.5 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
16.6 Application: Two-person Exchange Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
16.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
18 Parallelization 309
18.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
iii
18.2 Types of Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
18.3 Implicit Multithreading in NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
18.4 Multithreaded Loops in Numba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
18.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
19 JAX 321
V Other 375
23 Troubleshooting 377
23.1 Fixing Your Local Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
23.2 Reporting an Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Index 381
iv
Python Programming for Economics and Finance
This website presents a set of lectures on Python programming for economics and finance.
This is the first text in the series, which focuses on programming in Python.
For an overview of the series, see this page
• Introduction to Python
– About These Lectures
– Getting Started
– An Introductory Example
– Functions
– Python Essentials
– OOP I: Objects and Methods
– Names and Namespaces
– OOP II: Building Classes
– Writing Longer Programs
• The Scientific Libraries
– Python for Scientific Computing
– NumPy
– Matplotlib
– SciPy
– Pandas
– Pandas for Panel Data
– SymPy
• High Performance Computing
– Numba
– Parallelization
– JAX
• Advanced Python Programming
– Writing Good Code
– More Language Features
– Debugging and Handling Errors
• Other
– Troubleshooting
– Execution Statistics
CONTENTS 1
Python Programming for Economics and Finance
2 CONTENTS
Part I
Introduction to Python
3
CHAPTER
ONE
“Python has gotten sufficiently weapons grade that we don’t descend into R anymore. Sorry, R people. I used
to be one of you but we no longer descend into R.” – Chris Wiggins
1.1 Overview
This lecture series will teach you to use Python for scientific computing, with a focus on economics and finance.
The series is aimed at Python novices, although experienced users will also find useful content in later lectures.
In this lecture we will
• introduce Python,
• showcase some of its abilities,
• discuss the connection between Python and AI,
• explain why Python is our favorite language for scientific computing, and
• point you to the next steps.
You do not need to understand everything you see in this lecture – we will work through the details slowly later in the
lecture series.
No!
It’s tempting to think that in the age of AI we don’t need to learn how to code.
And it’s true that AIs like ChatGPT and other LLMs are wonderful productivity tools for coders.
In fact an AI can be a great companion for these lectures – try copy-pasting some code from this series and ask the AI to
explain it to you.
AIs will certainly help you write pieces of code that you can combine.
But AIs cannot completely and reliably solve a new problem that they haven’t seen before!
You will need to be the supervisor – and for that you need to be able to read, write, and understand computer code.
5
Python Programming for Economics and Finance
Python is, without doubt, one of the most popular programming languages.
Python libraries like pandas and Polars are replacing familiar tools like Excel and VBA as an essential skill in the fields
of finance and banking.
Moreover, Python is extremely popular within the scientific community – especially AI
The following chart, produced using Stack Overflow Trends, provides some evidence.
It shows the popularity of a Python AI library called PyTorch relative to MATLAB.
The chart shows that MATLAB’s popularity has faded, while PyTorch is growing rapidly.
Moreover, PyTorch is just one of the thousands of Python libraries available for scientic computing.
1.2.3 Features
Python is a high-level language, which means it is relatively easy to read, write and debug.
It has a relatively small core language that is easy to learn.
This core is supported by many libraries, which you can learn to use as required.
Python is very beginner-friendly
• suitable for students learning programming
• used in many undergraduate and graduate programs
Other features of Python:
• multiple programming styles are supported (procedural, object-oriented, functional, etc.)
One reason for Python’s popularity is its simple and elegant design — we’ll see many examples later on.
To get a feeling for this, let’s look at an example.
The code below is written in Java rather than Python.
You do not need to read and understand this code!
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
if (count > 0) {
double average = sum / count;
System.out.println(
"Average of the second column: " + average
);
} else {
System.out.println(
"No valid numeric data found in the second column."
);
}
(continues on next page)
This Java code opens an imaginary file called data.csv and computes the mean of the values in the second column.
Even without knowing Java, you can see that the program is long and complex.
Here’s Python code that does the same thing.
Even if you don’t yet know Python, you can see that the code is simpler and easier to read.
import csv
total, count = 0, 0
with open(data.csv, mode='r') as file:
reader = csv.reader(file)
for row in reader:
try:
total += float(row[1])
count += 1
except (ValueError, IndexError):
pass
print(f"Average: {total / count if count else 'No valid data'}")
The simplicity of Python and its neat design are a big factor in its popularity.
Unless you have been living under a rock and avoiding all contact with the modern world, you will know that AI is rapidly
advancing.
AI is already remarkably good at helping you write code, as discussed above.
No doubt AI will take over many tasks currently performed by humans, just like other forms of machinery have done
over the past few centuries.
Python is playing a huge role in the advance of AI and machine learning.
This means that tech firms are pouring money into development of extremely powerful Python libraries.
Even if you don’t plan to work on AI and machine learning, you can benefit from learning to use some of these libraries
for your own projects in economics, finance and other fields of science.
These lectures will explain how.
We have already discussed the importance of Python for AI, machine learning and data science
Let’s take a look at the role of Python in other areas of scientific computing.
Python is either the dominant player or a major player in
• astronomy
• chemistry
• computational biology
• meteorology
• natural language processing
• etc.
Use of Python is also rising in economics, finance, and adjacent fields like operations research – which were previously
dominated by MATLAB / Excel / STATA / C / Fortran.
This section briefly showcases some examples of Python for general scientific programming.
1.3.1 NumPy
One of the most important parts of scientific computing is working with data.
Data is often stored in matrices, vectors and arrays.
We can create a simple array of numbers with pure Python as follows:
[-3.14, 0, 3.14]
This array is very small so it’s fine to work with pure Python.
But when we want to work with larger arrays in real programs we need more efficiency and more tools.
For this we need to use libraries for working with arrays.
For Python, the most important matrix and array processing library is NumPy library.
For example, let’s build a NumPy array with 100 elements
b @ c
9.853229343548264e-16
While NumPy is still the king of array processing in Python, there are now important competitors.
Libraries such as JAX, Pytorch, and CuPy also have built in array types and array operations that can be very fast and
efficient.
In fact these libraries are better at exploiting parallelization and fast hardware, as we’ll explain later in this series.
However, you should still learn NumPy first because
• NumPy is simpler and provides a strong foundation, and
• libraries like JAX directly extend NumPy functionality and hence are easier to learn when you already know NumPy.
1.3.3 SciPy
The SciPy library is built on top of NumPy and provides additional functionality.
2
For example, let’s calculate ∫−2 𝜙(𝑧)𝑑𝑧 where 𝜙 is the standard normal density.
ϕ = norm()
value, error = quad(ϕ.pdf, -2, 2) # Integrate using Gaussian quadrature
value
0.9544997361036417
• linear algebra
• integration
• interpolation
• optimization
• distributions and statistical techniques
• signal processing
See them all here.
Later we’ll discuss SciPy in more detail.
1.3.4 Graphics
Example 3D plot
More examples can be found in the Matplotlib thumbnail gallery.
Other graphics libraries include
• Plotly
• seaborn — a high-level interface for matplotlib
• Altair
• Bokeh
You can visit the Python Graph Gallery for more example plots drawn using a variety of libraries.
The study of networks and graphs becoming an important part of scientific work in economics, finance and other fields.
For example, we are interesting in studying
• production networks
• networks of banks and financial institutions
• friendship and social networks
• etc.
(We have a book on economic networks if you would like to learn more.)
Python has many libraries for studying networks and graphs.
One well-known example is NetworkX.
Its features include, among many other things:
• standard graph algorithms for analyzing networks
• plotting routines
Here’s some example code that generates and plots a random graph, with node color determined by the shortest path
length from a central node.
import networkx as nx
import matplotlib.pyplot as plt
np.random.seed(1234)
As discussed above, there are literally thousands of scientific libraries for Python.
Some are small and do very specific tasks.
Others are huge in terms of lines of code and investment from coders and tech firms.
Here’s a short list of some important scientific libraries for Python not mentioned above.
• SymPy for symbolic algebra, including limits, derivatives and integrals
• statsmodels for statistical routines
• scikit-learn for machine learning
• Keras for machine learning
• Pyro and PyStan for Bayesian data analysis
• GeoPandas for spatial data analysis
• Dask for parallelization
• Numba for making Python run at the same speed as native machine code
• CVXPY for convex optimization
• scikit-image and OpenCV for processing and analysing image data
• BeautifulSoup for extracting data from HTML and XML files
In this lecture series we will learn how to use many of these libraries for scientific computing tasks in economics and
finance.
TWO
GETTING STARTED
2.1 Overview
The easiest way to get started coding in Python is by running it in the cloud.
(That is, by using a remote server that already has Python installed.)
One option that’s both free and reliable is Google Colab.
Colab also has the advantage of providing GPUs, which we will make use of in more advanced lectures.
Tutorials on how to get started with Google Colab can be found by web and video searches.
Most of our lectures include a “Launch notebook” button (with a play icon) on the top right connects you to an executable
version on Colab.
Local installs are preferable if you have access to a suitable machine and plan to do a substantial amount of Python
programming.
At the same time, local installs require more work than a cloud option like Colab.
The rest of this lecture runs you through the some details associated with local installs.
17
Python Programming for Economics and Finance
The core Python package is easy to install but not what you should choose for these lectures.
These lectures require the entire scientific programming ecosystem, which
• the core installation doesn’t provide
• is painful to install one piece at a time.
Hence the best approach for our purposes is to install a Python distribution that contains
1. the core Python language and
2. compatible versions of the most popular scientific libraries.
The best such distribution is Anaconda Python.
Anaconda is
• very popular
• cross-platform
• comprehensive
• completely unrelated to the Nicki Minaj song of the same name
Anaconda also comes with a package management system to organize your code libraries.
All of what follows assumes that you adopt this recommendation!
Anaconda supplies a tool called conda to manage and upgrade your Anaconda packages.
One conda command you should execute regularly is the one that updates the whole Anaconda distribution.
As a practice run, please execute the following
1. Open up a terminal
2. Type conda update anaconda
For more information on conda, type conda help in a terminal.
Jupyter notebooks are one of the many possible ways to interact with Python and the scientific libraries.
They use a browser-based interface to Python with
• The ability to write and execute Python commands.
• Formatted output in the browser, including tables, figures, animation, etc.
• The option to mix in formatted text and mathematical expressions.
Because of these features, Jupyter is now a major player in the scientific computing ecosystem.
Here’s an image showing execution of some code (borrowed from here) in a Jupyter notebook
While Jupyter isn’t the only way to code in Python, it’s great for when you wish to
• start coding in Python
• test new ideas or interact with small pieces of code
• use powerful online interactive environments such as Google Colab
• share or collaborate scientific ideas with students or colleagues
These lectures are designed for executing in Jupyter notebooks.
Once you have installed Anaconda, you can start the Jupyter notebook.
Either
• search for Jupyter in your applications menu, or
• open up a terminal and type jupyter notebook
– Windows users should substitute “Anaconda command prompt” for “terminal” in the previous line.
If you use the second option, you will see something like this
The output tells us the notebook is running at http://localhost:8888/
• localhost is the name of the local machine
• 8888 refers to port number 8888 on your computer
Thus, the Jupyter kernel is listening for Python commands on port 8888 of our local machine.
Hopefully, your default browser has also opened up with a web page that looks something like this
What you see here is called the Jupyter dashboard.
If you look at the URL at the top, it should be localhost:8888 or similar, matching the message above.
Assuming all this has worked OK, you can now click on New at the top right and select Python 3 or similar.
Here’s what shows up on our machine:
The notebook displays an active cell, into which you can type Python commands.
Let’s start with how to edit code and run simple programs.
Running Cells
Notice that, in the previous figure, the cell is surrounded by a green border.
This means that the cell is in edit mode.
In this mode, whatever you type will appear in the cell with the flashing cursor.
When you’re ready to execute the code in a cell, hit Shift-Enter instead of the usual Enter.
Note: There are also menu and button options for running code in a cell that you can find by exploring.
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing system.
This means that the effect of typing at the keyboard depends on which mode you are in.
The two modes are
1. Edit mode
• Indicated by a green border around one cell, plus a blinking cursor
• Whatever you type appears as is in that cell
2. Command mode
Python supports unicode, allowing the use of characters such as 𝛼 and 𝛽 as names in your code.
In a code cell, try typing \alpha and then hitting the tab key on your keyboard.
A Test Program
import numpy as np
import matplotlib.pyplot as plt
ax = plt.subplot(111, projection='polar')
ax.bar(θ, radii, width=width, bottom=0.0, color=colors, alpha=0.5)
plt.show()
Don’t worry about the details for now — let’s just run it and see what happens.
The easiest way to run this code is to copy and paste it into a cell in the notebook.
Hopefully you will get a similar plot.
Tab Completion
On-Line Help
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, figures and even videos in the
page.
For example, we can enter a mixture of plain text and LaTeX instead of code.
Next we Esc to enter command mode and then type m to indicate that we are writing Markdown, a mark-up language
similar to (but simpler than) LaTeX.
(You can also use your mouse to select Markdown from the Code drop-down box just below the list of menu items)
Now we Shift+Enter to produce this
Note: You may also need to open the Debugger Panel (View -> Debugger Panel).
You can set breakpoints by clicking on the line number of the cell you want to debug.
When you run the cell, the debugger will stop at the breakpoint.
You can then step through the code line by line using the buttons on the “Next” button on the CALLSTACK toolbar
(located in the right hand window).
You can explore more functionality of the debugger in the Jupyter documentation.
Notebook files are just text files structured in JSON and typically ending with .ipynb.
You can share them in the usual way that you share files — or by using web services such as nbviewer.
The notebooks you see on that site are static html representations.
To run one, download it as an ipynb file by clicking on the download icon at the top right.
Save it somewhere, navigate to it from the Jupyter dashboard and then run as discussed above.
Note: If you are interested in sharing notebooks containing interactive content, you might want to check out Binder.
To collaborate with other people on notebooks, you might want to take a look at
• Google Colab
• Kaggle
To keep the code private and to use the familiar JupyterLab and Notebook interface, look into the JupyterLab Real-Time
Collaboration extension.
QuantEcon has its own site for sharing Jupyter notebooks related to economics – QuantEcon Notes.
Notebooks submitted to QuantEcon Notes can be shared with a link, and are open to comments and votes by the com-
munity.
into a cell.
Alternatively, you can type the following into a terminal
So far we’ve focused on executing Python code entered into a Jupyter notebook cell.
Traditionally most Python code has been run in a different way.
Code is first saved in a text file on a local machine
By convention, these text files have a .py extension.
We can create an example of such a file as follows:
%%writefile foo.py
print("foobar")
Writing foo.py
This writes the line print("foobar") into a file called foo.py in the local directory.
Here %%writefile is an example of a cell magic.
If you come across code saved in a *.py file, you’ll need to consider the following questions:
1. how should you execute it?
2. How should you modify or edit it?
Option 1: JupyterLab
One can also edit files using a text editor and then run them from within Jupyter notebooks.
A text editor is an application that is specifically designed to work with text files — such as Python programs.
Nothing beats the power and efficiency of a good text editor for working with program text.
A good text editor will provide
• efficient text editing commands (e.g., copy, paste, search and replace)
• syntax highlighting, etc.
2.7 Exercises
Exercise 2.7.1
If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it.
Now launch again, but this time using jupyter notebook --no-browser.
This should start the kernel without launching the browser.
Note also the startup message: It should give you a URL such as http://localhost:8888 where the notebook is
running.
Now
1. Start your browser — or open a new tab if it’s already running.
2. Enter the URL from above (e.g. http://localhost:8888) in the address bar at the top.
You should now be able to run a standard Jupyter notebook session.
This is an alternative way to start the notebook that can also be handy.
This can also work when you accidentally close the webpage as long as the kernel is still running.
2.7. Exercises 35
Python Programming for Economics and Finance
THREE
AN INTRODUCTORY EXAMPLE
3.1 Overview
Suppose we want to simulate and plot the white noise process 𝜖0 , 𝜖1 , … , 𝜖𝑇 , where each draw 𝜖𝑡 is independent standard
normal.
In other words, we want to generate figures that look something like this:
(Here 𝑡 is on the horizontal axis and 𝜖𝑡 is on the vertical axis.)
We’ll do this in several different ways, each time learning something more about Python.
3.3 Version 1
Here are a few lines of code that perform the task we set
import numpy as np
import matplotlib.pyplot as plt
ϵ_values = np.random.randn(100)
plt.plot(ϵ_values)
plt.show()
37
Python Programming for Economics and Finance
3.3.1 Imports
The first two lines of the program import functionality from external code libraries.
The first line imports NumPy, a favorite Python package for tasks like
• working with arrays (vectors and matrices)
• common mathematical functions like cos and sqrt
• generating random numbers
• linear algebra, etc.
After import numpy as np we have access to these attributes via the syntax np.attribute.
Here’s two more examples
np.sqrt(4)
2.0
np.log(4)
1.3862943611198906
Packages
import numpy as np
print(np.__file__)
3.3. Version 1 39
Python Programming for Economics and Finance
Subpackages
import numpy as np
np.sqrt(4)
2.0
sqrt(4)
2.0
Returning to our program that plots white noise, the remaining three lines after the import statements are
ϵ_values = np.random.randn(100)
plt.plot(ϵ_values)
plt.show()
The first line generates 100 (quasi) independent standard normals and stores them in ϵ_values.
The next two lines genererate the plot.
We can and will look at various ways to configure and improve this plot below.
Let’s try writing some alternative versions of our first program, which plotted IID draws from the standard normal distri-
bution.
The programs below are less efficient than the original one, and hence somewhat artificial.
But they do help us illustrate some important Python syntax and semantics in a familiar setting.
ts_length = 100
ϵ_values = [] # empty list
for i in range(ts_length):
e = np.random.randn()
ϵ_values.append(e)
plt.plot(ϵ_values)
plt.show()
In brief,
• The first line sets the desired length of the time series.
• The next line creates an empty list called ϵ_values that will store the 𝜖𝑡 values as we generate them.
• The statement # empty list is a comment, and is ignored by Python’s interpreter.
• The next three lines are the for loop, which repeatedly draws a new random number 𝜖𝑡 and appends it to the end
of the list ϵ_values.
• The last two lines generate the plot and display it to the user.
Let’s study some parts of this program in more detail.
3.4.2 Lists
list
The first element of x is an integer, the next is a string, and the third is a Boolean value.
x.append(2.5)
x
Here append() is what’s called a method, which is a function “attached to” an object—in this case, the list x.
We’ll learn all about methods later on, but just to give you some idea,
• Python objects such as lists, strings, etc. all have methods that are used to manipulate data contained in the object.
• String objects have string methods, list objects have list methods, etc.
Another useful list method is pop()
x.pop()
2.5
Lists in Python are zero-based (as in C, Java or Go), so the first element is referenced by x[0]
10
'foo'
Now let’s consider the for loop from the program above, which was
for i in range(ts_length):
e = np.random.randn()
ϵ_values.append(e)
Python executes the two indented lines ts_length times before moving on.
These two lines are called a code block, since they comprise the “block” of code that we are looping over.
Unlike most other languages, Python knows the extent of the code block only from indentation.
In our program, indentation decreases after line ϵ_values.append(e), telling Python that this line marks the lower
limit of the code block.
More on indentation below—for now, let’s look at another example of a for loop
This example helps to clarify how the for loop works: When we execute a loop of the form
In discussing the for loop, we explained that the code blocks being looped over are delimited by indentation.
In fact, in Python, all code blocks (i.e., those occurring inside loops, if clauses, function definitions, etc.) are delimited
by indentation.
Thus, unlike most other languages, whitespace in Python code affects the output of the program.
Once you get used to it, this is a good thing: It
• forces clean, consistent indentation, improving readability
• removes clutter, such as the brackets or end statements used in other languages
On the other hand, it takes a bit of care to get right, so please remember:
• The line before the start of a code block always ends in a colon
– for i in range(10):
– if x > y:
The for loop is the most common technique for iteration in Python.
But, for the purpose of illustration, let’s modify the program above to use a while loop instead.
ts_length = 100
ϵ_values = []
i = 0
while i < ts_length:
e = np.random.randn()
ϵ_values.append(e)
i = i + 1
plt.plot(ϵ_values)
plt.show()
A while loop will keep executing the code block delimited by indentation until the condition (i < ts_length) is
satisfied.
In this case, the program will keep adding values to the list ϵ_values until i equals ts_length:
True
Note that
• the code block for the while loop is again delimited only by indentation.
• the statement i = i + 1 can be replaced by i += 1.
for t in range(T):
b[t+1] = (1 + r) * b[t]
The statement b = np.empty(T+1) allocates storage in memory for T+1 (floating point) numbers.
These numbers are filled in by the for loop.
Allocating memory at the start is more efficient than using a Python list and append, since the latter must repeatedly
ask for storage space from the operating system.
Notice that we added a legend to the plot — a feature you will be asked to use in the exercises.
3.6 Exercises
Now we turn to exercises. It is important that you complete them before continuing, since they present new concepts we
will need.
Exercise 3.6.1
Your first task is to simulate and plot the correlated time series
import numpy as np
import matplotlib.pyplot as plt
3.6. Exercises 47
Python Programming for Economics and Finance
α = 0.9
T = 200
x = np.empty(T+1)
x[0] = 0
for t in range(T):
x[t+1] = α * x[t] + np.random.randn()
plt.plot(x)
plt.show()
Exercise 3.6.2
Starting with your solution to exercise 1, plot three simulated time series, one for each of the cases 𝛼 = 0, 𝛼 = 0.8 and
𝛼 = 0.98.
Use a for loop to step through the 𝛼 values.
If you can, add a legend, to help distinguish between the three time series.
Hint:
• If you call the plot() function multiple times before calling show(), all of the lines you produce will end up
on the same figure.
• For the legend, noted that suppose var = 42, the expression f'foo{var}' evaluates to 'foo42'.
for α in α_values:
x[0] = 0
for t in range(T):
x[t+1] = α * x[t] + np.random.randn()
plt.plot(x, label=f'$\\alpha = {α}$')
plt.legend()
plt.show()
Note: f'$\\alpha = {α}$' in the solution is an application of f-String, which allows you to use {} to contain an
expression.
The contained expression will be evaluated, and the result will be placed into the string.
Exercise 3.6.3
3.6. Exercises 49
Python Programming for Economics and Finance
α = 0.9
T = 200
x = np.empty(T+1)
x[0] = 0
for t in range(T):
x[t+1] = α * np.abs(x[t]) + np.random.randn()
plt.plot(x)
plt.show()
Exercise 3.6.4
One important aspect of essentially all programming languages is branching and conditions.
In Python, conditions are usually implemented with if–else syntax.
Here’s an example, that prints -1 for each negative number in an array and 1 for each nonnegative number
for x in numbers:
if x < 0:
print(-1)
else:
print(1)
-1
1
-1
1
Now, write a new solution to Exercise 3 that does not use an existing function to compute the absolute value.
Replace this existing function with an if–else condition.
α = 0.9
T = 200
x = np.empty(T+1)
x[0] = 0
for t in range(T):
if x[t] < 0:
abs_x = - x[t]
else:
abs_x = x[t]
x[t+1] = α * abs_x + np.random.randn()
plt.plot(x)
plt.show()
3.6. Exercises 51
Python Programming for Economics and Finance
α = 0.9
T = 200
x = np.empty(T+1)
x[0] = 0
for t in range(T):
abs_x = - x[t] if x[t] < 0 else x[t]
x[t+1] = α * abs_x + np.random.randn()
plt.plot(x)
plt.show()
Exercise 3.6.5
Here’s a harder exercise, that takes some thought and planning.
The task is to compute an approximation to 𝜋 using Monte Carlo.
Use no imports besides
import numpy as np
3.6. Exercises 53
Python Programming for Economics and Finance
count = 0
for i in range(n):
area_estimate = count / n
3.141068
FOUR
FUNCTIONS
4.1 Overview
import numpy as np
import matplotlib.pyplot as plt
Python has a number of built-in functions that are available without import.
We have already met some
max(19, 20)
20
print('foobar')
55
Python Programming for Economics and Finance
foobar
str(22)
'22'
type(22)
int
If the built-in functions don’t cover what we need, we either need to import functions or create our own.
Examples of importing and using functions were given in the previous lecture
Here’s another one, which tests whether a given year is a leap year:
import calendar
calendar.isleap(2024)
True
Here’s a very simple Python function, that implements the mathematical function 𝑓(𝑥) = 2𝑥 + 1
def f(x):
return 2 * x + 1
Now that we’ve defined this function, let’s call it and check whether it does what we expect:
f(1)
f(10)
56 Chapter 4. Functions
Python Programming for Economics and Finance
21
Here’s a longer function, that computes the absolute value of a given number.
(Such a function already exists as a built-in, but let’s write our own for the exercise.)
def new_abs_function(x):
if x < 0:
abs_value = -x
else:
abs_value = x
return abs_value
print(new_abs_function(3))
print(new_abs_function(-3))
3
3
Note that a function can have arbitrarily many return statements (including zero).
Execution of the function terminates when the first return is hit, allowing code like the following example
def f(x):
if x < 0:
return 'negative'
return 'nonnegative'
(Writing functions with multiple return statements is typically discouraged, as it can make logic hard to follow.)
Functions without a return statement automatically return the special Python object None.
In this call to Matplotlib’s plot function, notice that the last argument is passed in name=argument syntax.
This is called a keyword argument, with label being the keyword.
Non-keyword arguments are called positional arguments, since their meaning is determined by order
The keyword argument values we supplied in the definition of f become the default values
f(2)
14
def f(x):
return x**3
and
f = lambda x: x**3
58 Chapter 4. Functions
Python Programming for Economics and Finance
quad(lambda x: x**3, 0, 2)
(4.0, 4.440892098500626e-14)
Here the function created by lambda is said to be anonymous because it was never given a name.
User-defined functions are important for improving the clarity of your code by
• separating different strands of logic
• facilitating code reuse
(Writing the same thing twice is almost always a bad idea)
We will say more about this later.
4.4 Applications
ts_length = 100
ϵ_values = [] # empty list
for i in range(ts_length):
e = np.random.randn()
ϵ_values.append(e)
plt.plot(ϵ_values)
plt.show()
4.4. Applications 59
Python Programming for Economics and Finance
def generate_data(n):
ϵ_values = []
for i in range(n):
e = np.random.randn()
ϵ_values.append(e)
return ϵ_values
data = generate_data(100)
plt.plot(data)
plt.show()
60 Chapter 4. Functions
Python Programming for Economics and Finance
When the interpreter gets to the expression generate_data(100), it executes the function body with n set equal to
100.
The net result is that the name data is bound to the list ϵ_values returned by the function.
4.4. Applications 61
Python Programming for Economics and Finance
Hopefully, the syntax of the if/else clause is self-explanatory, with indentation again delimiting the extent of the code
blocks.
Notes
• We are passing the argument U as a string, which is why we write it as 'U'.
• Notice that equality is tested with the == syntax, not =.
– For example, the statement a = 10 assigns the name a to the value 10.
– The expression a == 10 evaluates to either True or False, depending on the value of a.
Now, there are several ways that we can simplify the code above.
For example, we can get rid of the conditionals all together by just passing the desired generator type as a function.
To understand this, consider the following version.
62 Chapter 4. Functions
Python Programming for Economics and Finance
Now, when we call the function generate_data(), we pass np.random.uniform as the second argument.
This object is a function.
When the function call generate_data(100, np.random.uniform) is executed, Python runs the function
code block with n equal to 100 and the name generator_type “bound” to the function np.random.uniform.
• While these lines are executed, the names generator_type and np.random.uniform are “synonyms”,
and can be used in identical ways.
This principle works more generally—for example, consider the following piece of code
m = max
m(7, 2, 4)
Here we created another name for the built-in function max(), which could then be used in identical ways.
In the context of our program, the ability to bind new names to functions means that there is no problem passing a function
as an argument to another function—as we did above.
4.4. Applications 63
Python Programming for Economics and Finance
def x_loop(t):
x = 1
for i in range(t):
x = 2 * x
return x
def x(t):
if t == 0:
return 1
else:
return 2 * x(t-1)
What happens here is that each successive call uses it’s own frame in the stack
• a frame is where the local variables of a given function call are held
• stack is memory used to process function calls
– a First In Last Out (FILO) queue
This example is somewhat contrived, since the first (iterative) solution would usually be preferred to the recursive solution.
We’ll meet less contrived applications of recursion later on.
4.6 Exercises
Exercise 4.6.1
Recall that 𝑛! is read as “𝑛 factorial” and defined as 𝑛! = 𝑛 × (𝑛 − 1) × ⋯ × 2 × 1.
We will only consider 𝑛 as a positive integer here.
There are functions to compute this in various modules, but let’s write our own version as an exercise.
In particular, write a function factorial such that factorial(n) returns 𝑛! for any positive integer 𝑛.
64 Chapter 4. Functions
Python Programming for Economics and Finance
def factorial(n):
k = 1
for i in range(n):
k = k * (i + 1)
return k
factorial(4)
24
Exercise 4.6.2
The binomial random variable 𝑌 ∼ 𝐵𝑖𝑛(𝑛, 𝑝) represents the number of successes in 𝑛 binary trials, where each trial
succeeds with probability 𝑝.
Without any import besides from numpy.random import uniform, write a function binomial_rv such that
binomial_rv(n, p) generates one draw of 𝑌 .
Hint: If 𝑈 is uniform on (0, 1) and 𝑝 ∈ (0, 1), then the expression U < p evaluates to True with probability 𝑝.
binomial_rv(10, 0.5)
Exercise 4.6.3
First, write a function that returns one realization of the following random device
1. Flip an unbiased coin 10 times.
2. If a head occurs k or more times consecutively within this sequence at least once, pay one dollar.
3. If not, pay nothing.
Second, write another function that does the same task except that the second rule of the above random device becomes
• If a head occurs k or more times within this sequence, pay one dollar.
4.6. Exercises 65
Python Programming for Economics and Finance
payoff = 0
count = 0
for i in range(10):
U = uniform()
count = count + 1 if U < 0.5 else 0
print(count) # print counts for clarity
if count == k:
payoff = 1
return payoff
draw(3)
0
0
0
0
1
0
0
1
2
0
payoff = 0
count = 0
for i in range(10):
U = uniform()
count = count + ( 1 if U < 0.5 else 0 )
print(count)
if count == k:
payoff = 1
return payoff
draw_new(3)
66 Chapter 4. Functions
Python Programming for Economics and Finance
1
2
2
2
3
3
3
3
4
5
Exercise 4.7.1
The Fibonacci numbers are defined by
The first few numbers in the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55.
Write a function to recursively compute the 𝑡-th Fibonacci number for any 𝑡.
def x(t):
if t == 0:
return 0
if t == 1:
return 1
else:
return x(t-1) + x(t-2)
Let’s test it
Exercise 4.7.2
Rewrite the function factorial() in from Exercise 1 using recursion.
def recursion_factorial(n):
if n == 1:
return n
else:
return n * recursion_factorial(n-1)
Let’s test it
68 Chapter 4. Functions
CHAPTER
FIVE
PYTHON ESSENTIALS
5.1 Overview
Boolean Values
One simple data type is Boolean values, which can be either True or False
x = True
x
True
We can check the type of any object in memory using the type() function.
69
Python Programming for Economics and Finance
type(x)
bool
In the next line of code, the interpreter evaluates the expression on the right of = and binds y to this value
y = 100 < 10
y
False
type(y)
bool
x + y
x * y
True + True
sum(bools)
Numeric Types
x = complex(1, 2)
y = complex(2, 1)
print(x * y)
type(x)
5j
complex
5.2.2 Containers
Python has several basic types for storing collections of (possibly heterogeneous) data.
We’ve already discussed lists.
A related data type is tuples, which are “immutable” lists
('a', 'b')
type(x)
tuple
In Python, an object is called immutable if, once created, the object cannot be changed.
Conversely, an object is mutable if it can still be altered after creation.
Python lists are mutable
x = [1, 2]
x[0] = 10
x
[10, 2]
x = (1, 2)
x[0] = 10
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[13], line 2
1 x = (1, 2)
----> 2 x[0] = 10
We’ll say more about the role of mutable and immutable data a bit later.
Tuples (and lists) can be “unpacked” as follows
10
20
Slice Notation
To access multiple elements of a sequence (a list, a tuple or a string), you can use Python’s slice notation.
For example,
a[1:3]
['b', 'c']
['d', 'e']
You can also use the format [start:end:step] to specify the step
a[::2]
Using a negative step, you can return the sequence in a reversed order
a[-2::-1] # Walk backwards from the second last element to the first element
s = 'foobar'
s[-3:] # Select the last three elements
'bar'
Two other container types we should mention before moving on are sets and dictionaries.
Dictionaries are much like lists, except that the items are named instead of numbered
dict
d['age']
33
s1 = {'a', 'b'}
type(s1)
set
s2 = {'b', 'c'}
s1.issubset(s2)
False
s1.intersection(s2)
{'b'}
{'bar', 'foo'}
Let’s briefly review reading and writing to text files, starting with writing
Here
• The built-in function open() creates a file object for writing to.
• Both write() and close() are methods of file objects.
Where is this file that we’ve created?
Recall that Python maintains a concept of the present working directory (pwd) that can be located from with Jupyter or
IPython via
%pwd
'/home/runner/work/lecture-python-programming.myst/lecture-python-programming.myst/
↪lectures'
f = open('newfile.txt', 'r')
out = f.read()
out
'Testing\nTesting again'
print(out)
Testing
Testing again
In fact, the recommended approach in modern Python is to use a with statement to ensure the files are properly acquired
and released.
Containing the operations within the same block also improves the clarity of your code.
Let’s try to convert the two examples above into a with statement.
We change the writing example first
Note that we do not need to call the close() method since the with block will ensure the stream is closed at the end
of the block.
With slight modifications, we can also read files using with
Testing
Testing again
Now suppose that we want to read input from one file and write output to another. Here’s how we could accomplish this
task while correctly acquiring and returning resources to the operating system using with statements:
Line 0: Testing
We can simplify the example above by grouping the two with statements into one line
Line 0: Testing
Suppose we want to continue to write into the existing file instead of overwriting it.
we can switch the mode to a which stands for append mode
Line 0: Testing
Note: Note that we only covered r, w, and a mode here, which are the most commonly used modes. Python provides a
variety of modes that you could experiment with.
5.3.1 Paths
Note that if newfile.txt is not in the present working directory then this call to open() fails.
In this case, you can shift the file to the pwd or specify the full path to the file
f = open('insert_full_path_to_file/newfile.txt', 'r')
5.4 Iterating
One of the most important tasks in computing is stepping through a sequence of data and performing a given action.
One of Python’s strengths is its simple, flexible interface to this kind of iteration via the for loop.
Many Python objects are “iterable”, in the sense that they can be looped over.
To give an example, let’s write the file us_cities.txt, which lists US cities and their population, to the present working
directory.
%%writefile us_cities.txt
new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229
Overwriting us_cities.txt
Here format() is a string method used for inserting variables into strings.
The reformatting of each line is the result of three different string methods, the details of which can be left till later.
The interesting part of this program for us is line 2, which shows that
1. The file object data_file is iterable, in the sense that it can be placed to the right of in within a for loop.
2. Iteration steps through each line in the file.
This leads to the clean, convenient syntax shown in our program.
Many other kinds of objects are iterable, and we’ll discuss some of them later on.
5.4. Iterating 77
Python Programming for Economics and Finance
One thing you might have noticed is that Python tends to favor looping without explicit indexing.
For example,
1
4
9
is preferred to
for i in range(len(x_values)):
print(x_values[i] * x_values[i])
1
4
9
When you compare these two alternatives, you can see why the first one is preferred.
Python provides some facilities to simplify looping without indices.
One is zip(), which is used for stepping through pairs from two sequences.
For example, try running the following code
The zip() function is also useful for creating dictionaries — for example
If we actually need the index from a list, one option is to use enumerate().
To understand what enumerate() does, consider the following example
letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'
We can also simplify the code for generating the list of random draws considerably by using something called a list
comprehension.
List comprehensions are an elegant Python tool for creating lists.
Consider the following example, where the list comprehension is on the right-hand side of the second line
range(8)
range(0, 8)
5.5.1 Comparisons
Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or False).
A common type is comparisons, such as
x, y = 1, 2
x < y
True
x > y
False
1 < 2 < 3
True
1 <= 2 <= 3
True
x = 1 # Assignment
x == 2 # Comparison
False
1 != 2
True
Note that when testing conditions, we can use any valid Python expression
'yes'
'no'
True
False
True
not True
False
True
Remember
• P and Q is True if both are True, else False
• P or Q is False if both are False, else True
We can also use all() and any() to test a sequence of expressions
True
False
True
Note:
• all() returns True when all boolean values/expressions in the sequence are True
• any() returns True when any boolean values/expressions in the sequence are True
A consistent coding style and the use of documentation can make the code easier to understand and maintain.
You can find Python programming philosophy by typing import this at the prompt.
Among other things, Python strongly favors consistency in programming style.
We’ve all heard the saying about consistency and little minds.
In programming, as in mathematics, the opposite is true
• A mathematical paper where the symbols ∪ and ∩ were reversed would be very hard to read, even if the author
told you so on the first page.
In Python, the standard style is set out in PEP8.
(Occasionally we’ll deviate from PEP8 in these lectures to better match mathematical notation)
5.6.2 Docstrings
Python has a system for adding comments to modules, classes, functions, etc. called docstrings.
The nice thing about docstrings is that they are available at run-time.
Try running this
def f(x):
"""
This function squares its argument
"""
return x**2
f?
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument
f??
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2
With one question mark we bring up the docstring, and with two we get the source code as well.
You can find conventions for docstrings in PEP257.
5.7 Exercises
Exercise 5.7.1
Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their inner product using
zip().
Part 2: In one line, count the number of even numbers in 0,…,99.
Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of pairs (a, b) such
that both a and b are even.
x_vals = [1, 2, 3]
y_vals = [1, 1, 1]
sum([x * y for x, y in zip(x_vals, y_vals)])
5.7. Exercises 83
Python Programming for Economics and Finance
Part 2 Solution:
One solution is
50
50
Some less natural alternatives that nonetheless help to illustrate the flexibility of list comprehensions are
50
and
50
Part 3 Solution:
Here’s one possibility
Exercise 5.7.2
Consider the polynomial
𝑛
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑛 𝑥𝑛 = ∑ 𝑎𝑖 𝑥𝑖 (5.1)
𝑖=0
Write a function p such that p(x, coeff) that computes the value in (5.1) given a point x and a list of coefficients
coeff (𝑎1 , 𝑎2 , ⋯ 𝑎𝑛 ).
Try to use enumerate() in your loop.
Exercise 5.7.3
Write a function that takes a string as an argument and returns the number of capital letters in the string.
def f(string):
count = 0
for letter in string:
if letter == letter.upper() and letter.isalpha():
count += 1
return count
def count_uppercase_chars(s):
return sum([c.isupper() for c in s])
Exercise 5.7.4
Write a function that takes two sequences seq_a and seq_b as arguments and returns True if every element in seq_a
is also an element of seq_b, else False.
• By “sequence” we mean a list, a tuple or a string.
• Do the exercise without using sets and set methods.
5.7. Exercises 85
Python Programming for Economics and Finance
# == test == #
print(f("ab", "cadb"))
print(f("ab", "cjdb"))
print(f([1, 2], [1, 2, 3]))
print(f([1, 2, 3], [1, 2]))
True
False
True
False
# == test == #
print(f("ab", "cadb"))
print(f("ab", "cjdb"))
print(f([1, 2], [1, 2, 3]))
print(f([1, 2, 3], [1, 2]))
True
False
True
False
Of course, if we use the sets data type then the solution is easier
Exercise 5.7.5
When we cover the numerical libraries, we will see they include many alternatives for interpolation and function approx-
imation.
Nevertheless, let’s write our own function approximation routine as an exercise.
In particular, without using any imports, write a function linapprox that takes as arguments
• A function f mapping some interval [𝑎, 𝑏] into ℝ.
• Two scalars a and b providing the limits of this interval.
Parameters
==========
f : function
The function to approximate
n : integer
Number of grid points
Returns
=======
A float. The interpolant evaluated at x
"""
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals
# === x must lie between the gridpoints (point - step) and point === #
u, v = point - step, point
Exercise 5.7.6
Using list comprehension syntax, we can simplify the loop in the following code.
import numpy as np
n = 100
(continues on next page)
5.7. Exercises 87
Python Programming for Economics and Finance
n = 100
ϵ_values = [np.random.randn() for i in range(n)]
SIX
6.1 Overview
The traditional programming paradigm (think Fortran, C, MATLAB, etc.) is called procedural.
It works as follows
• The program has a state corresponding to the values of its variables.
• Functions are called to act on and transform the state.
• Final outputs are produced via a sequence of function calls.
Two other important paradigms are object-oriented programming (OOP) and functional programming.
In the OOP paradigm, data and functions are bundled together into “objects” — and functions in this context are referred
to as methods.
Methods are called on to transform the data contained in the object.
• Think of a Python list that contains data and has methods such as append() and pop() that transform the data.
Functional programming languages are built on the idea of composing functions.
• Influential examples include Lisp, Haskell and Elixir.
So which of these categories does Python fit into?
Actually Python is a pragmatic language that blends object-oriented, functional and procedural styles, rather than taking
a purist approach.
On one hand, this allows Python and its users to cherry pick nice aspects of different paradigms.
On the other hand, the lack of purity might at times lead to some confusion.
Fortunately this confusion is minimized if you understand that, at a foundational level, Python is object-oriented.
By this we mean that, in Python, everything is an object.
In this lecture, we explain what that statement means and why it matters.
We’ll make use of the following third party library
89
Python Programming for Economics and Finance
6.2 Objects
In Python, an object is a collection of data and instructions held in computer memory that consists of
1. a type
2. a unique identity
3. data (i.e., content)
4. methods
These concepts are defined and discussed sequentially below.
6.2.1 Type
Python provides for different types of objects, to accommodate different categories of data.
For example
s = 'This is a string'
type(s)
str
int
'300' + 'cc'
'300cc'
300 + 400
700
'300' + 400
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 '300' + 400
Here we are mixing types, and it’s unclear to Python whether the user wants to
• convert '300' to an integer and then add it to 400, or
• convert 400 to string and then concatenate it with '300'
Some languages might try to guess but Python is strongly typed
• Type is important, and implicit type conversion is rare.
• Python will respond instead by raising a TypeError.
To avoid the error, you need to clarify by changing the relevant type.
For example,
700
6.2.2 Identity
In Python, each object has a unique identifier, which helps Python (and us) keep track of the object.
The identity of an object can be obtained via the id() function
y = 2.5
z = 2.5
id(y)
140212571074288
id(z)
140212571074032
In this example, y and z happen to have the same value (i.e., 2.5), but they are not the same object.
The identity of an object is in fact just the address of the object in memory.
6.2. Objects 91
Python Programming for Economics and Finance
If we set x = 42 then we create an object of type int that contains the data 42.
In fact, it contains more, as the following example shows
x = 42
x
42
x.imag
x.__class__
int
When Python creates this integer object, it stores with it various auxiliary information, such as the imaginary part, and
the type.
Any name following a dot is called an attribute of the object to the left of the dot.
• e.g.,imag and __class__ are attributes of x.
We see from this example that objects have attributes that contain auxiliary information.
They also have attributes that act like functions, called methods.
These attributes are important, so let’s discuss them in-depth.
6.2.4 Methods
x = ['foo', 'bar']
callable(x.append)
True
callable(x.__doc__)
False
Methods typically act on the data contained in the object they belong to, or combine that data with other data
x = ['a', 'b']
x.append('c')
s = 'This is a string'
s.upper()
'THIS IS A STRING'
s.lower()
'this is a string'
s.replace('This', 'That')
'That is a string'
x = ['a', 'b']
x[0] = 'aa' # Item assignment using square bracket notation
x
['aa', 'b']
It doesn’t look like there are any methods used here, but in fact the square bracket assignment notation is just a convenient
interface to a method call.
What actually happens is that Python calls the __setitem__ method, as follows
x = ['a', 'b']
x.__setitem__(0, 'aa') # Equivalent to x[0] = 'aa'
x
['aa', 'b']
(If you wanted to you could modify the __setitem__ method, so that square bracket assignment does something
totally different)
There’s a nice package called rich that helps us view the contents of an object.
For example,
inspect(10, methods=True)
↪│
│ │ 10 ␣
↪ │ │
│␣
↪╰──────────────────────────────────────────────────────────────────────────────────────────────
↪│
│ ␣
↪ │
│ denominator = 1 ␣
↪ │
│ imag = 0 ␣
↪ │
│ numerator = 10 ␣
↪ │
│ real = 10 ␣
↪ │
│ as_integer_ratio = def as_integer_ratio(): Return a pair of integers, whose␣
↪ratio is equal to the original int. │
In fact there are still more methods, as you can see if you execute inspect(10, all=True).
In this lecture we claimed that Python is, at heart, an object oriented language.
But here’s an example that looks more procedural.
x = ['a', 'b']
m = len(x)
m
x = ['a', 'b']
len(x)
and
x = ['a', 'b']
x.__len__()
6.5 Summary
6.6 Exercises
Exercise 6.6.1
We have met the boolean data type previously.
Using what we have learnt in this lecture, print a list of methods of the boolean object True.
Hint: You can use callable() to test whether an attribute of an object can be called as a function
print(sorted(True.__dir__()))
↪']
or
print(sorted(dir(True)))
↪']
Since the boolean data type is a primitive type, you can also find it in the built-in namespace
print(dir(__builtins__.bool))
↪']
Here we use a for loop to filter out attributes that are callable
attributes = dir(__builtins__.bool)
callablels = []
6.6. Exercises 97
Python Programming for Economics and Finance
SEVEN
7.1 Overview
This lecture is all about variable names, how they can be used and how they are understood by the Python interpreter.
This might sound a little dull but the model that Python has adopted for handling names is elegant and interesting.
In addition, you will save yourself many hours of debugging if you have a good understanding of how names work in
Python.
x = 42
We now know that when this statement is executed, Python creates an object of type int in your computer’s memory,
containing
• the value 42
• some associated attributes
But what is x itself?
In Python, x is called a name, and the statement x = 42 binds the name x to the integer object we have just discussed.
Under the hood, this process of binding names to objects is implemented as a dictionary—more about this in a moment.
There is no problem binding two or more names to the one object, regardless of what that object is
g = f
id(g) == id(f)
True
g('test')
99
Python Programming for Economics and Finance
test
In the first step, a function object is created, and the name f is bound to it.
After binding the name g to the same object, we can use it anywhere we would use f.
What happens when the number of names bound to an object goes to zero?
Here’s an example of this situation, where the name x is first bound to one object and then rebound to another
x = 'foo'
id(x)
x = 'bar'
id(x)
140561686619472
In this case, after we rebind x to 'bar', no names bound are to the first object 'foo'.
This is a trigger for 'foo' to be garbage collected.
In other words, the memory slot that stores that object is deallocated and returned to the operating system.
Garbage collection is actually an active research area in computer science.
You can read more on garbage collection if you are interested.
7.3 Namespaces
x = 42
Definition
A namespace is a symbol table that maps names to objects in memory.
%%file mathfoo.py
pi = 'foobar'
Writing mathfoo.py
import mathfoo
Next let’s import the math module from the standard library
import math
math.pi
3.141592653589793
mathfoo.pi
'foobar'
These two different bindings of pi exist in different namespaces, each one implemented as a dictionary.
If you wish, you can look at the dictionary directly, using module_name.__dict__.
import math
math.__dict__.items()
↪miniconda3/envs/quantecon/lib/python3.12/lib-dynload/math.cpython-312-x86_64-
↪quantecon/lib/python3.12/lib-dynload/math.cpython-312-x86_64-linux-gnu.so'), ('pi
import mathfoo
mathfoo.__dict__
{'__name__': 'mathfoo',
'__doc__': None,
'__package__': '',
'__loader__': <_frozen_importlib_external.SourceFileLoader at 0x7fd70ff65100>,
'__spec__': ModuleSpec(name='mathfoo', loader=<_frozen_importlib_external.
↪SourceFileLoader object at 0x7fd70ff65100>, origin='/home/runner/work/lecture-
↪python-programming.myst/lecture-python-programming.myst/lectures/mathfoo.py'),
'__file__': '/home/runner/work/lecture-python-programming.myst/lecture-python-
↪programming.myst/lectures/mathfoo.py',
'__cached__': '/home/runner/work/lecture-python-programming.myst/lecture-python-
↪programming.myst/lectures/__pycache__/mathfoo.cpython-312.pyc',
↪example, builtins.len is\nthe full name for the built-in function len().\n\nThis␣
↪useful in modules that provide\nobjects with the same name as a built-in value,␣
'__package__': '',
'__loader__': _frozen_importlib.BuiltinImporter,
'__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.
↪BuiltinImporter'>, origin='built-in'),
'__IPYTHON__': True,
'display': <function IPython.core.display_functions.display(*objs, include=None,␣
↪exclude=None, metadata=None, transient=None, display_id=None, raw=False,␣
↪clear=False, **kwargs)>,
'pi': 'foobar'}
As you know, we access elements of the namespace using the dotted attribute notation
math.pi
3.141592653589793
math.__dict__['pi']
3.141592653589793
vars(math).items()
↪miniconda3/envs/quantecon/lib/python3.12/lib-dynload/math.cpython-312-x86_64-
↪quantecon/lib/python3.12/lib-dynload/math.cpython-312-x86_64-linux-gnu.so'), ('pi
['__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'acos',
'acosh',
'asin',
'asinh']
print(math.__doc__)
math.__name__
'math'
print(__name__)
__main__
When we run a script using IPython’s run command, the contents of the file are executed as part of __main__ too.
To see this, let’s create a file mod.py that prints its own __name__ attribute
%%file mod.py
print(__name__)
Writing mod.py
mod
__main__
In the second case, the code is executed as part of __main__, so __name__ is equal to __main__.
To see the contents of the namespace of __main__ we use vars() rather than vars(__main__).
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has initialized when you started up
your session.
If you prefer to see only the variables you have initialized, use %whos
x = 2
y = 3
import numpy as np
%whos
import amodule
At this point, the interpreter creates a namespace for the module amodule and starts executing commands in the module.
While this occurs, the namespace amodule.__dict__ is the global namespace.
Once execution of the module finishes, the interpreter returns to the module from where the import statement was made.
In this case it’s __main__, so the namespace of __main__ again becomes the global namespace.
Important fact: When we call a function, the interpreter creates a local namespace for that function, and registers the
variables in that namespace.
The reason for this will be explained in just a moment.
Variables in the local namespace are called local variables.
After the function returns, the namespace is deallocated and lost.
While the function is executing, we can view the contents of the local namespace with locals().
For example, consider
def f(x):
a = 2
print(locals())
return a * x
f(1)
{'x': 1, 'a': 2}
We have been using various built-in functions, such as max(), dir(), str(), list(), len(), range(),
type(), etc.
How does access to these names work?
• These definitions are stored in a module called __builtin__.
• They have their own namespace called __builtins__.
['In', 'Out', '_', '_10', '_11', '_12', '_13', '_14', '_15', '_16']
['ArithmeticError',
'AssertionError',
'AttributeError',
'BaseException',
'BaseExceptionGroup',
'BlockingIOError',
'BrokenPipeError',
'BufferError',
'BytesWarning',
'ChildProcessError']
__builtins__.max
<function max>
But __builtins__ is special, because we can always access them directly as well
max
<function max>
__builtins__.max == max
True
def f():
a = 2
def g():
b = 4
print(a * b)
g()
Here f is the enclosing function for g, and each function gets its own namespaces.
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is
1. the local namespace (if it exists)
2. the hierarchy of enclosing namespaces (if they exist)
3. the global namespace
4. the builtin namespace
If the name is not in any of these namespaces, the interpreter raises a NameError.
This is called the LEGB rule (local, enclosing, global, builtin).
Here’s an example that helps to illustrate.
Visualizations here are created by nbtutor in a Jupyter notebook.
They can help you better understand your program when you are learning a new language.
Consider a script test.py that looks as follows
%%file test.py
def g(x):
a = 1
x = x + a
return x
a = 0
y = g(10)
print("a = ", a, "y = ", y)
Writing test.py
%run test.py
a = 0 y = 11
First,
• The global namespace {} is created.
• The function object is created, and g is bound to it within the global namespace.
• The name a is bound to 0, again in the global namespace.
• Statement x = x + a uses the local a and local x to compute x + a, and binds local name x to the result.
• This value is returned, and y is bound to it in the global namespace.
• Local x and a are discarded (and the local namespace is deallocated).
This is a good time to say a little more about mutable vs immutable objects.
Consider the code segment
def f(x):
x = x + 1
return x
x = 1
print(f(x), x)
2 1
We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as the value of x.
First f and x are registered in the global namespace.
The call f(x) creates a local namespace and adds x to it, bound to 1.
Next, this local x is rebound to the new integer object 2, and this value is returned.
None of this affects the global x.
However, it’s a different story when we use a mutable data type such as a list
def f(x):
x[0] = x[0] + 1
return x
x = [1]
print(f(x), x)
[2] [2]
Note: The global x and the local x refer to the same [1]
We can see the identity of local x and the identity of global x are the same
def f(x):
x[0] = x[0] + 1
print(f'the identity of local x is {id(x)}')
return x
x = [1]
print(f'the identity of global x is {id(x)}')
print(f(x), x)
• Within f(x)
If you want to modify the local x and the global x separately, you can create a copy of the list and assign the copy to the
local x.
We will leave this for you to explore.
EIGHT
8.1 Overview
import numpy as np
import matplotlib.pyplot as plt
117
Python Programming for Economics and Finance
As discussed an earlier lecture, in the OOP paradigm, data and functions are bundled together into “objects”.
An example is a Python list, which not only stores data but also knows how to sort itself, etc.
x = [1, 5, 4]
x.sort()
x
[1, 4, 5]
As we now know, sort is a function that is “part of” the list object — and hence called a method.
If we want to make our own types of objects we need to use class definitions.
A class definition is a blueprint for a particular class of objects (e.g., lists, strings or complex numbers).
It describes
• What kind of data the class stores
• What methods it has for acting on these data
An object or instance is a realization of the class, created from the blueprint
• Each instance has its own unique data.
• Methods set out in the class definition act on this (and other) data.
In Python, the data and methods of an object are collectively referred to as attributes.
Attributes are accessed via “dotted attribute notation”
• object_name.data
• object_name.method_name()
In the example
x = [1, 5, 4]
x.sort()
x.__class__
list
• x is an object or instance, created from the definition for Python lists, but with its own particular data.
OOP is useful for the same reason that abstraction is useful: for recognizing and exploiting the common structure.
For example,
• a Markov chain consists of a set of states, an initial probability distribution over states, and a collection of proba-
bilities of moving across states
• a general equilibrium theory consists of a commodity space, preferences, technologies, and an equilibrium definition
• a game consists of a list of players, lists of actions available to each player, each player’s payoffs as functions of all
other players’ actions, and a timing protocol
These are all abstractions that collect together “objects” of the same “type”.
Recognizing common structure allows us to employ common tools.
In economic theory, this might be a proposition that applies to all games of a certain type.
In Python, this might be a method that’s useful for all Markov chains (e.g., simulate).
When we use OOP, the simulate method is conveniently bundled together with the Markov chain object.
def earn(w,y):
"Consumer with inital wealth w earns y"
return w+y
def spend(w,x):
"consumer with initial wealth w spends x"
new_wealth = w -x
if new_wealth < 0:
print("Insufficient funds")
else:
return new_wealth
The earn function takes a consumer’s initial wealth 𝑤 and adds to it her current earnings 𝑦.
The spend function takes a consumer’s initial wealth 𝑤 and deducts from it her current spending 𝑥.
We can use these two functions to keep track of a consumer’s wealth as she earns and spends.
For example
w0=100
w1=earn(w0,10)
w2=spend(w1,20)
(continues on next page)
A Class bundles a set of data tied to a particular instance together with a collection of functions that operate on the data.
In our example, an instance will be the name of particular person whose instance data consist solely of its wealth.
(In other examples instance data will consist of a vector of data.)
In our example, two functions earn and spend can be applied to the current instance data.
Taken together, the instance data and functions are called methods.
These can be readily accessed in ways that we shall describe now.
class Consumer:
The earn and spend methods deploy the functions we described earlier and that can potentially be applied to the
wealth instance data.
The __init__ method is a constructor method.
Whenever we create an instance of the class, the __init_ method will be called automatically.
Calling __init__ sets up a “namespace” to hold the instance data — more on this soon.
We’ll also discuss the role of the peculiar self bookkeeping device in detail below.
Usage
Here’s an example in which we use the class Consumer to create an instance of a consumer whom we affectionately
name 𝑐1.
After we create consumer 𝑐1 and endow it with initial wealth 10, we’ll apply the spend method.
c1.earn(15)
c1.spend(100)
Insufficent funds
We can of course create multiple instances, i.e., multiple consumers, each with its own name and data
c1 = Consumer(10)
c2 = Consumer(12)
c2.spend(4)
c2.wealth
c1.wealth
10
Each instance, i.e., each consumer, stores its data in a separate namespace dictionary
c1.__dict__
{'wealth': 10}
c2.__dict__
{'wealth': 8}
When we access or set attributes we’re actually just modifying the dictionary maintained by the instance.
Self
If you look at the Consumer class definition again you’ll see the word self throughout the code.
The rules for using self in creating a Class are that
• Any instance data should be prepended with self
– e.g., the earn method uses self.wealth rather than just wealth
• A method defined within the code that defines the class should have self as its first argument
– e.g., def earn(self, y) rather than just def earn(y)
• Any method referenced within the class should be called as self.method_name
There are no examples of the last rule in the preceding code but we will see some shortly.
Details
In this section, we look at some more formal details related to classes and self
• You might wish to skip to the next section the first time you read this lecture.
• You can return to these details after you’ve familiarized yourself with more examples.
Methods actually live inside a class object formed when the interpreter reads the class definition
Note how the three methods __init__, earn and spend are stored in the class object.
Consider the following code
c1 = Consumer(10)
c1.earn(10)
c1.wealth
20
When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argument 10 to Consumer.
earn.
In fact, the following are equivalent
• c1.earn(10)
• Consumer.earn(c1, 10)
In the function call Consumer.earn(c1, 10) note that c1 is the first argument.
Recall that in the definition of the earn method, self is the first parameter
The end result is that self is bound to the instance c1 inside the function call.
That’s why the statement self.wealth += y inside earn ends up modifying c1.wealth.
For our next example, let’s write a simple class to implement the Solow growth model.
The Solow growth model is a neoclassical growth model in which the per capita capital stock 𝑘𝑡 evolves according to the
rule
𝑠𝑧𝑘𝑡𝛼 + (1 − 𝛿)𝑘𝑡
𝑘𝑡+1 = (8.1)
1+𝑛
Here
• 𝑠 is an exogenously given saving rate
• 𝑧 is a productivity parameter
• 𝛼 is capital’s share of income
• 𝑛 is the population growth rate
• 𝛿 is the depreciation rate
A steady state of the model is a 𝑘 that solves (8.1) when 𝑘𝑡+1 = 𝑘𝑡 = 𝑘.
Here’s a class that implements this model.
Some points of interest in the code are
• An instance maintains a record of its current capital stock in the variable self.k.
• The h method implements the right-hand side of (8.1).
• The update method uses h to update capital as per (8.1).
– Notice how inside update the reference to the local method h is self.h.
The methods steady_state and generate_sequence are fairly self-explanatory
class Solow:
r"""
Implements the Solow growth model with the update rule
"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
z=2.0, # productivity
(continues on next page)
def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)
def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()
def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))
Here’s a little program that uses the class to compute time series from two different initial conditions.
The common steady state is also plotted for comparison
s1 = Solow()
s2 = Solow(k=8.0)
T = 60
fig, ax = plt.subplots(figsize=(9, 6))
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$k_t$', fontsize=14)
ax.legend()
plt.show()
Next, let’s write a class for competitive market in which buyers and sellers are both price takers.
The market consists of the following objects:
• A linear demand curve 𝑄 = 𝑎𝑑 − 𝑏𝑑 𝑝
• A linear supply curve 𝑄 = 𝑎𝑧 + 𝑏𝑧 (𝑝 − 𝑡)
Here
• 𝑝 is price paid by the buyer, 𝑄 is quantity and 𝑡 is a per-unit tax.
• Other symbols are demand and supply parameters.
The class provides methods to compute various values of interest, including competitive equilibrium price and quantity,
tax revenue raised, consumer surplus and producer surplus.
Here’s our implementation.
(It uses a function from SciPy called quad for numerical integration—a topic we will say more about later on.)
class Market:
"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
raise ValueError('Insufficient demand.')
def price(self):
"Compute equilibrium price"
return (self.ad - self.az + self.bz * self.tax) / (self.bd + self.bz)
def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()
def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad / self.bd) - (1 / self.bd) * x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()
def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az / self.bz) + (1 / self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area
def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()
Here’s a short program that uses this class to plot an inverse demand curve together with inverse supply curves with and
without taxes
q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)
fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()
def deadw(m):
"Computes deadweight loss for market m."
# == Create analogous market with no tax == #
m_no_tax = Market(m.ad, m.bd, m.az, m.bz, 0)
# == Compare surplus, return difference == #
surp1 = m_no_tax.consumer_surp() + m_no_tax.producer_surp()
surp2 = m.consumer_surp() + m.producer_surp() + m.taxrev()
return surp1 - surp2
1.125
Let’s look at one more example, related to chaotic dynamics in nonlinear systems.
A simple transition rule that can generate erratic time paths is the logistic map
Let’s write a class for generating time series from this model.
Here’s one implementation
class Chaos:
"""
Models the dynamical system :math:`x_{t+1} = r x_t (1 - x_t)`
"""
def __init__(self, x0, r):
"""
Initialize with state x0 and parameter r
"""
self.x, self.r = x0, r
def update(self):
"Apply the map to update state."
self.x = self.r * self.x *(1 - self.x)
ch = Chaos(0.1, 4.0)
ts_length = 250
fig, ax = plt.subplots()
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label='$x_t$')
plt.show()
fig, ax = plt.subplots()
ch = Chaos(0.1, 4)
r = 2.5
while r < 4:
ch.r = r
t = ch.generate_sequence(1000)[950:]
ax.plot([r] * len(t), t, 'b.', ms=0.6)
(continues on next page)
ax.set_xlabel('$r$', fontsize=16)
ax.set_ylabel('$x_t$', fontsize=16)
plt.show()
x = (10, 20)
len(x)
If you want to provide a return value for the len function when applied to your user-defined object, use the __len__
special method
class Foo:
def __len__(self):
return 42
Now we get
f = Foo()
len(f)
42
class Foo:
f = Foo()
f(8) # Exactly equivalent to f.__call__(8)
50
8.5 Exercises
Exercise 8.5.1
The empirical cumulative distribution function (ecdf) corresponding to a sample {𝑋𝑖 }𝑛𝑖=1 is defined as
1 𝑛
𝐹𝑛 (𝑥) ∶= ∑ 1{𝑋𝑖 ≤ 𝑥} (𝑥 ∈ ℝ) (8.3)
𝑛 𝑖=1
Here 1{𝑋𝑖 ≤ 𝑥} is an indicator function (one if 𝑋𝑖 ≤ 𝑥 and zero otherwise) and hence 𝐹𝑛 (𝑥) is the fraction of the
sample that falls below 𝑥.
The Glivenko–Cantelli Theorem states that, provided that the sample is IID, the ecdf 𝐹𝑛 converges to the true distribution
function 𝐹 .
Implement 𝐹𝑛 as a class called ECDF, where
• A given sample {𝑋𝑖 }𝑛𝑖=1 are the instance data, stored as self.observations.
• The class implements a __call__ method that returns 𝐹𝑛 (𝑥) for any 𝑥.
Your code should work as follows (modulo randomness)
class ECDF:
# == test == #
print(F(0.5))
0.4
0.481
Exercise 8.5.2
In an earlier exercise, you wrote a function for evaluating polynomials.
This exercise is an extension, where the task is to build a simple class called Polynomial for representing and manip-
ulating polynomial functions such as
𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (𝑥 ∈ ℝ) (8.4)
𝑛=0
The instance data for the class Polynomial will be the coefficients (in the case of (8.4), the numbers 𝑎0 , … , 𝑎𝑁 ).
Provide methods that
1. Evaluate the polynomial (8.4), returning 𝑝(𝑥) for any 𝑥.
2. Differentiate the polynomial, replacing the original coefficients with those of its derivative 𝑝′ .
Avoid using any import statements.
class Polynomial:
def differentiate(self):
"Reset self.coefficients to those of p' instead of p."
(continues on next page)
NINE
9.1 Overview
So far, we have explored the use of Jupyter Notebooks in writing and executing Python code.
While they are efficient and adaptable when working with short pieces of code, Notebooks are not the best choice for
longer programs and scripts.
Jupyter Notebooks are well suited to interactive computing (i.e. data science workflows) and can help execute chunks of
code one at a time.
Text files and scripts allow for long pieces of code to be written and executed in a single go.
We will explore the use of Python scripts as an alternative.
The Jupyter Lab and Visual Studio Code (VS Code) development environments are then introduced along with a primer
on version control (Git).
In this lecture, you will learn to
• work with Python scripts
• set up various development environments
• get started with GitHub
Note: Going forward, it is assumed that you have an Anaconda environment up and running.
You may want to create a new conda environment if you haven’t done so already.
Python files are used when writing long, reusable blocks of code - by convention, they have a .py suffix.
Let us begin by working with the following example.
135
Python Programming for Economics and Finance
7 plt.plot(x, y)
8 plt.xlabel('x')
9 plt.ylabel('y')
10 plt.title('Sine Wave')
11 plt.show()
9 plt.plot(x, y)
10 plt.xlabel('x')
11 plt.ylabel('y')
12 plt.title(title)
13 plt.show()
This allows you to split your code into chunks and structure your codebase better.
Look into the use of modules and packages for more information on importing functionality.
JupyterLab is a browser based development environment for Jupyter Notebooks, code scripts, and data files.
You can try JupyterLab in the browser if you want to test it out before installing it locally.
You can install JupyterLab using pip
> jupyter-lab
You can see that the Jupyter Server is running on port 8888 on the localhost.
The following interface should open up on your default browser automatically - if not, CTRL + Click the server URL.
Click on
• the Python 3 (ipykernel) button under Notebooks to open a new Jupyter Notebook
• the Python File button to open a new Python script (.py)
You can always open this launcher tab by clicking the ‘+’ button on the top.
All the files and folders in your working directory can be found in the File Browser (tab on the left).
You can create new files and folders using the buttons available at the top of the File Browser tab.
You can install extensions that increase the functionality of JupyterLab by visiting the Extensions tab.
Coming back to the example scripts from earlier, there are two ways to work with them in JupyterLab.
• Using magic commands
• Using the terminal
Jupyter Notebooks and JupyterLab support the use of magic commands - commands that extend the capabilities of a
standard Jupyter Notebook.
The %run magic command allows you to run a Python script from within a Notebook.
This is a convenient way to run scripts that you are working on in the same directory as your Notebook and present the
outputs within the Notebook.
However, if you are looking into just running the .py file, it is sometimes easier to use the terminal.
Open a terminal from the launcher and run the following command.
Note: You can also run the script line by line by opening an ipykernel console either
• from the launcher
• by right clicking within the Notebook and selecting Create Console for Editor
Use Shift + Enter to run a line of code.
Visual Studio Code (VS Code) is a code editor and development workspace that can run
• in the browser.
• as a local installation.
Both interfaces are identical.
When you launch VS Code, you will see the following interface.
Explore how to customize VS Code to your liking through the guided walkthroughs.
When presented with the following prompt, go ahead an install all recommended extensions.
You can also install extensions from the Extensions tab.
Jupyter Notebooks (.ipynb files) can be worked on in VS Code.
Make sure to install the Jupyter extension from the Extensions tab before you try to open a Jupyter Notebook.
Create a new file (in the file Explorer tab) and save it with the .ipynb extension.
Choose a kernel/environment to run the Notebook in by clicking on the Select Kernel button on the top right corner of
the editor.
VS Code also has excellent version control functionality through the Source Control tab.
Link your GitHub account to VS Code to push and pull changes to and from your repositories.
Further discussions about version control can be found in the next section.
To open a new Terminal in VS Code, click on the Terminal tab and select New Terminal.
VS Code opens a new Terminal in the same directory you are working in - a PowerShell in Windows and a Bash in Linux.
You can change the shell or open a new instance through the dropdown menu on the right end of the terminal tab.
VS Code helps you manage conda environments without using the command line.
Open the Command Palette (CTRL + SHIFT + P or from the dropdown menu under View tab) and search for Python:
Select Interpreter.
This loads existing environments.
You can also create new environments using Python: Create Environment in the Command Palette.
A new environment (.conda folder) is created in the the current working directory.
Coming to the example scripts from earlier, there are again two ways to work with them in VS Code.
• Using the run button
• Using the terminal
You can run the script by clicking on the run button on the top right corner of the editor.
You can also run the script interactively by selecting the Run Current File in Interactive Window option from the
dropdown.
The command python <path to file.py> is executed on the console of your choice.
If you are using a Windows machine, you can either use the Anaconda Prompt or the Command Prompt - but, generally
not the PowerShell.
Here’s an execution of the earlier code.
Note: If you would like to develop packages and build tools using Python, you may want to look into the use of Docker
containers and VS Code.
However, this is outside the focus of these lectures.
Git is an extremely powerful tool for distributed collaboration — for example, we use it to share and synchronize all the
source files for these lectures.
There are two main flavors of Git
1. the plain vanilla command line Git version
2. the various point-and-click GUI versions
• See, for example, the GitHub version or Git GUI integrated into your IDE.
In case you already haven’t, try
1. Installing Git.
2. Getting a copy of QuantEcon.py using Git.
For example, if you’ve installed the command line version, open up a terminal and enter.
(This is just git clone in front of the URL for the repository)
This command will download all necessary components to rebuild the lecture you are reading now.
As the 2nd task,
1. Sign up to GitHub.
2. Look into ‘forking’ GitHub repositories (forking means making your own copy of a GitHub repository, stored on
GitHub).
3. Fork QuantEcon.py.
4. Clone your fork to some local directory, make edits, commit them, and push them back up to your forked GitHub
repo.
5. If you made a valuable improvement, send us a pull request!
149
CHAPTER
TEN
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of
all evil.” – Donald Knuth
10.1 Overview
Let’s briefly review Python’s scientific libraries, starting with why we need them.
151
Python Programming for Economics and Finance
One obvious reason we use scientific libraries is because they implement routines we want to use.
For example, it’s almost always better to use an existing routine for root finding than to write a new one from scratch.
(For standard algorithms, efficiency is maximized if the community can coordinate on a common set of implementations,
written by experts and tuned by users to be as fast and robust as possible.)
But this is not the only reason that we use Python’s scientific libraries.
Another is that pure Python, while flexible and elegant, is not fast.
So we need libraries that are designed to accelerate execution of Python code.
As we’ll see below, there are now Python libraries that can do this extremely well.
In terms of popularity, the big four in the world of scientific Python libraries are
• NumPy
• SciPy
• Matplotlib
• Pandas
For us, there’s another (relatively new) library that will also be essential for numerical computing:
• Numba
Over the next few lectures we’ll see how to use these libraries.
But first, let’s quickly review how they fit together.
• NumPy forms the foundations by providing a basic array data type (think of vectors and matrices) and functions
for acting on these arrays (e.g., matrix multiplication).
• SciPy builds on NumPy by adding the kinds of numerical methods that are routinely used in science (interpolation,
optimization, root finding, etc.).
• Matplotlib is used to generate figures, with a focus on plotting data stored in NumPy arrays.
• Pandas provides types and functions for empirical work (e.g., manipulating data).
• Numba accelerates execution via JIT compilation — we’ll learn about this soon.
The upside is that, compared to low-level languages, Python is typically faster to write, less error-prone and easier to
debug.
The downside is that Python is harder to optimize — that is, turn into fast machine code — than languages like C or
Fortran.
Indeed, the standard implementation of Python (called CPython) cannot match the speed of compiled languages such as
C or Fortran.
Does that mean that we should just switch to C or Fortran for everything?
The answer is: No, no and one hundred times no!
(This is what you should say to the senior professor insisting that the model needs to be rewritten in Fortran or C++.)
There are two reasons why:
First, for any given program, relatively few lines are ever going to be time-critical.
Hence it is far more efficient to write most of our code in a high productivity language like Python.
Second, even for those lines of code that are time-critical, we can now achieve the same speed as C or Fortran using
Python’s scientific libraries.
Before we learn how to do this, let’s try to understand why plain vanilla Python is slower than C or Fortran.
This will, in turn, help us figure out how to speed things up.
Dynamic Typing
a, b = 10, 10
a + b
20
Even for this simple operation, the Python interpreter has a fair bit of work to do.
For example, in the statement a + b, the interpreter has to know which operation to invoke.
If a and b are strings, then a + b requires string concatenation
a, b = 'foo', 'bar'
a + b
'foobar'
a, b = ['foo'], ['bar']
a + b
['foo', 'bar']
(We say that the operator + is overloaded — its action depends on the type of the objects on which it acts)
As a result, Python must check the type of the objects and then call the correct operation.
This involves substantial overheads.
Static Types
#include <stdio.h>
int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}
In C or Fortran, these integers would typically be stored in an array, which is a simple data structure for storing homoge-
neous data.
Such an array is stored in a single contiguous block of memory
• In modern computers, memory addresses are allocated to each byte (one byte = 8 bits).
• For example, a 64 bit integer is stored in 8 bytes of memory.
• An array of 𝑛 such integers occupies 8𝑛 consecutive memory slots.
Moreover, the compiler is made aware of the data type by the programmer.
• In this case 64 bit integers
Hence, each successive data point can be accessed by shifting forward in memory space by a known and fixed amount.
• In this case 8 bytes
10.4 Vectorization
There is a clever method called vectorization that can be used to speed up high level languages in numerical applications.
The key idea is to send array processing operations in batch to pre-compiled and efficient native machine code.
The machine code itself is typically compiled from carefully optimized C or Fortran.
For example, when working in a high level language, the operation of inverting a large matrix can be subcontracted to
efficient machine code that is pre-compiled for this purpose and supplied to users as part of a package.
This clever idea dates back to MATLAB, which uses vectorization extensively.
Vectorization can greatly accelerate many numerical computations (but not all, as we shall see).
Let’s see how vectorization works in Python, using NumPy.
import random
import numpy as np
import quantecon as qe
Next let’s try some non-vectorized code, which uses a native Python loop to generate, square and then sum a large number
of random variables:
n = 1_000_000
%%time
CPU times: user 277 ms, sys: 310 μs, total: 277 ms
Wall time: 277 ms
%%time
x = np.random.uniform(0, 1, n)
y = np.sum(x**2)
CPU times: user 9.02 ms, sys: 201 μs, total: 9.22 ms
Wall time: 8.95 ms
As you can see, the second code block runs much faster. Why?
The second code block breaks the loop down into three basic operations
1. draw n uniforms
2. square them
3. sum them
These are sent as batch operators to optimized machine code.
Apart from minor overheads associated with sending data back and forth, the result is C or Fortran-like speed.
When we run batch operations on arrays like this, we say that the code is vectorized.
Vectorized code is typically fast and efficient.
It is also surprisingly flexible, in the sense that many operations can be vectorized.
The next section illustrates this point.
Many functions provided by NumPy are so-called universal functions — also called ufuncs.
This means that they
• map scalars into scalars, as expected
• map arrays into arrays, acting element-wise
For example, np.cos is a ufunc:
np.cos(1.0)
0.5403023058681398
np.cos(np.linspace(0, 1, 3))
Here’s a plot of 𝑓
%%time
m = -np.inf
for x in grid:
(continues on next page)
%%time
x, y = np.meshgrid(grid, grid)
np.max(f(x, y))
0.9999819641085747
In the vectorized version, all the looping takes place in compiled code.
As you can see, the second version is much faster.
(We’ll make it even faster again later on, using more scientific programming tricks.)
ELEVEN
NUMPY
“Let’s be clear: the work of science has nothing whatever to do with consensus. Consensus is the business
of politics. Science, on the contrary, requires only one investigator who happens to be right, which means
that he or she has results that are verifiable by reference to the real world. In science consensus is irrelevant.
What is relevant is reproducible results.” – Michael Crichton
11.1 Overview
11.1.1 References
import numpy as np
a = np.zeros(3)
a
161
Python Programming for Economics and Finance
type(a)
numpy.ndarray
NumPy arrays are somewhat like native Python lists, except that
• Data must be homogeneous (all elements of the same type).
• These types must be one of the data types (dtypes) provided by NumPy.
The most important of these dtypes are:
• float64: 64 bit floating-point number
• int64: 64 bit integer
• bool: 8 bit True or False
There are also dtypes to represent complex numbers, unsigned integers, etc.
On modern machines, the default dtype for arrays is float64
a = np.zeros(3)
type(a[0])
numpy.float64
a = np.zeros(3, dtype=int)
type(a[0])
numpy.int64
z = np.zeros(10)
Here z is a flat array with no dimension — neither row nor column vector.
The dimension is recorded in the shape attribute, which is a tuple
z.shape
(10,)
Here the shape tuple has only one element, which is the length of the array (tuples with one element end with a comma).
To give it dimension, we can change the shape attribute
z.shape = (10, 1)
z
array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])
z = np.zeros(4)
z.shape = (2, 2)
z
array([[0., 0.],
[0., 0.]])
In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() function, as in z = np.zeros((2,
2)).
z = np.empty(3)
z
z = np.identity(2)
z
array([[1., 0.],
[0., 1.]])
In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array
array([10, 20])
type(z)
numpy.ndarray
array([10., 20.])
array([[1, 2],
[3, 4]])
See also np.asarray, which performs a similar function, but does not make a distinct copy of data already in a NumPy
array.
na = np.linspace(10, 20, 2)
na is np.asarray(na) # Does not copy NumPy arrays
True
False
To read in the array data from a text file containing numeric data use np.loadtxt or np.genfromtxt—see the
documentation for details.
z = np.linspace(1, 2, 5)
z
z[0]
1.0
array([1. , 1.25])
z[-1]
2.0
array([[1, 2],
[3, 4]])
z[0, 0]
z[0, 1]
And so on.
Note that indices are still zero-based, to maintain compatibility with Python sequences.
Columns and rows can be extracted as follows
z[0, :]
array([1, 2])
z[:, 1]
array([2, 4])
z = np.linspace(2, 4, 5)
z
array([2. , 3. , 3.5])
z[d]
array([2.5, 3. ])
z = np.empty(3)
z
array([2. , 3. , 3.5])
z[:] = 42
z
a = np.array((4, 3, 2, 1))
a
array([4, 3, 2, 1])
array([1, 2, 3, 4])
a.sum() # Sum
10
a.mean() # Mean
2.5
a.max() # Max
array([ 1, 3, 6, 10])
array([ 1, 2, 6, 24])
a.var() # Variance
1.25
1.118033988749895
a.shape = (2, 2)
a.T # Equivalent to a.transpose()
array([[1, 3],
[2, 4]])
z = np.linspace(2, 4, 5)
z
z.searchsorted(2.2)
Many of the methods discussed above have equivalent functions in the NumPy namespace
a = np.array((4, 3, 2, 1))
np.sum(a)
10
np.mean(a)
2.5
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
a + b
a * b
a + 10
a * 10
A = np.ones((2, 2))
B = np.ones((2, 2))
A + B
array([[2., 2.],
[2., 2.]])
A + 10
array([[11., 11.],
[11., 11.]])
A * B
array([[1., 1.],
[1., 1.]])
With Anaconda’s scientific Python package based around Python 3.5 and above, one can use the @ symbol for matrix
multiplication, as follows:
A = np.ones((2, 2))
B = np.ones((2, 2))
A @ B
array([[2., 2.],
[2., 2.]])
(For older versions of Python and NumPy you need to use the np.dot function)
We can also use @ to take the inner product of two flat arrays
A = np.array((1, 2))
B = np.array((10, 20))
A @ B
50
array([[1, 2],
[3, 4]])
A @ (0, 1)
array([2, 4])
11.5 Broadcasting
Note: Broadcasting is a very important aspect of NumPy. At the same time, advanced broadcasting is relatively complex
and some of the details below can be skimmed on first pass.
a = np.array(
[[1, 2, 3],
[4, 5, 6],
(continues on next page)
a + b
array([[ 4, 8, 12],
[ 7, 11, 15],
[10, 14, 18]])
b.shape = (3, 1)
a + b
array([[ 4, 5, 6],
[10, 11, 12],
[16, 17, 18]])
result
↪extract a single element from your array before performing this operation.␣
a = np.array([3, 6, 9])
b = np.array([2, 3, 4])
b.shape = (3, 1)
a + b
array([[ 5, 8, 11],
[ 6, 9, 12],
[ 7, 10, 13]])
a = np.array(
[[1, 2],
[4, 5],
[7, 8]])
b = np.array([3, 6, 9])
a + b
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[69], line 7
1 a = np.array(
2 [[1, 2],
(continues on next page)
ValueError: operands could not be broadcast together with shapes (3,2) (3,)
We can see that NumPy cannot expand the arrays to the same size.
It is because, when b is expanded from b -> (3,) to b -> (3, 3), NumPy cannot match b with a -> (3, 2).
Things get even trickier when we move to higher dimensions.
To help us, we can use the following list of rules:
• Step 1: When the dimensions of two arrays do not match, NumPy will expand the one with fewer dimensions by
adding dimension(s) on the left of the existing dimensions.
– For example, if a -> (3, 3) and b -> (3,), then broadcasting will add a dimension to the left so that
b -> (1, 3);
– If a -> (2, 2, 2) and b -> (2, 2), then broadcasting will add a dimension to the left so that b
-> (1, 2, 2);
– If a -> (3, 2, 2) and b -> (2,), then broadcasting will add two dimensions to the left so that b
-> (1, 1, 2) (you can also see this process as going through Step 1 twice).
• Step 2: When the two arrays have the same dimension but different shapes, NumPy will try to expand dimensions
where the shape index is 1.
– For example, if a -> (1, 3) and b -> (3, 1), then broadcasting will expand dimensions with shape
1 in both a and b so that a -> (3, 3) and b -> (3, 3);
– If a -> (2, 2, 2) and b -> (1, 2, 2), then broadcasting will expand the first dimension of b so
that b -> (2, 2, 2);
– If a -> (3, 2, 2) and b -> (1, 1, 2), then broadcasting will expand b on all dimensions with
shape 1 so that b -> (3, 2, 2).
Here are code examples for broadcasting higher dimensional arrays
a = np.array(
[[[1, 2],
(continues on next page)
[[2, 3],
[3, 4]]])
print(f'the shape of array a is {a.shape}')
b = np.array(
[[1,7],
[7,1]])
print(f'the shape of array b is {b.shape}')
a + b
array([[[ 2, 9],
[ 9, 4]],
[[ 3, 10],
[10, 5]]])
a = np.array(
[[[1, 2],
[3, 4]],
[[4, 5],
[6, 7]],
[[7, 8],
[9, 10]]])
print(f'the shape of array a is {a.shape}')
b = np.array([3, 6])
print(f'the shape of array b is {b.shape}')
a + b
array([[[ 4, 8],
[ 6, 10]],
[[ 7, 11],
[ 9, 13]],
[[10, 14],
[12, 16]]])
• Step 3: After Step 1 and 2, if the two arrays still do not match, a ValueError will be raised. For example,
a = np.array(
[[[1, 2, 3],
[2, 3, 4]],
[[2, 3, 4],
[3, 4, 5]]])
print(f'the shape of array a is {a.shape}')
b = np.array(
[[1,7],
[7,1]])
print(f'the shape of array b is {b.shape}')
a + b
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[73], line 14
9 b = np.array(
10 [[1,7],
11 [7,1]])
12 print(f'the shape of array b is {b.shape}')
---> 14 a + b
ValueError: operands could not be broadcast together with shapes (2,2,3) (2,2)
a = np.array([42, 44])
a
array([42, 44])
array([42, 0])
Mutability leads to the following behavior (which can be shocking to MATLAB programmers…)
a = np.random.randn(3)
a
b = a
b[0] = 0.0
a
a = np.random.randn(3)
a
b = np.copy(a)
b
b[:] = 1
b
NumPy provides versions of the standard functions log, exp, sin, etc. that act element-wise on arrays
z = np.array([1, 2, 3])
np.sin(z)
n = len(z)
y = np.empty(n)
for i in range(n):
y[i] = np.sin(z[i])
Because they act element-wise on arrays, these functions are called vectorized functions.
In NumPy-speak, they are also called ufuncs, which stands for “universal functions”.
As we saw above, the usual arithmetic operations (+, *, etc.) also work element-wise, and combining these with the
ufuncs gives a very large set of fast element-wise functions.
array([1, 2, 3])
def f(x):
return 1 if x > 0 else 0
x = np.random.randn(4)
x
array([0, 0, 1, 0])
f = np.vectorize(f)
f(x) # Passing the same vector x as in the previous example
array([0, 0, 1, 0])
However, this approach doesn’t always obtain the same speed as a more carefully crafted vectorized function.
11.7.2 Comparisons
z = np.array([2, 3])
y = np.array([2, 3])
z == y
y[0] = 5
z == y
array([False, True])
z != y
z = np.linspace(0, 10, 5)
z
z > 3
b = z > 3
b
z[b]
z[z > 3]
11.7.3 Sub-packages
NumPy provides some additional functionality related to scientific programming through its sub-packages.
We’ve already seen how we can generate random variables using np.random
4.989
-2.0000000000000004
array([[-2. , 1. ],
[ 1.5, -0.5]])
Much of this functionality is also available in SciPy, a collection of modules that are built on top of NumPy.
11.8 Exercises
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10,6)
Exercise 11.8.1
Consider the polynomial expression
𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (11.1)
𝑛=0
Earlier, you wrote a simple function p(x, coeff) to evaluate (11.1) without considering efficiency.
Now write a new function that does the same job, but uses NumPy arrays and array operations for its computations, rather
than any form of Python loop.
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise don’t use this class)
Let’s test it
x = 2
coef = np.linspace(2, 4, 3)
print(coef)
print(p(x, coef))
# For comparison
q = np.poly1d(np.flip(coef))
print(q(x))
[2. 3. 4.]
24.0
24.0
Exercise 11.8.2
Let q be a NumPy array of length n with q.sum() == 1.
Suppose that q represents a probability mass function.
We wish to generate a discrete random variable 𝑥 such that ℙ{𝑥 = 𝑖} = 𝑞𝑖 .
In other words, x takes values in range(len(q)) and x = i with probability q[i].
The standard (inverse transform) algorithm is as follows:
• Divide the unit interval [0, 1] into 𝑛 subintervals 𝐼0 , 𝐼1 , … , 𝐼𝑛−1 such that the length of 𝐼𝑖 is 𝑞𝑖 .
• Draw a uniform random variable 𝑈 on [0, 1] and return the 𝑖 such that 𝑈 ∈ 𝐼𝑖 .
The probability of drawing 𝑖 is the length of 𝐼𝑖 , which is equal to 𝑞𝑖 .
We can implement the algorithm as follows
def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]
If you can’t see how this works, try thinking through the flow for a simple example, such as q = [0.25, 0.75] It
helps to sketch the intervals on paper.
Your exercise is to speed it up using NumPy, avoiding explicit loops
class DiscreteRV:
"""
Generates an array of draws from a discrete random variable with vector of
probabilities given by q.
(continues on next page)
The logic is not obvious, but if you take your time and read it slowly, you will understand.
There is a problem here, however.
Suppose that q is altered after an instance of discreteRV is created, for example by
q = (0.1, 0.9)
d = DiscreteRV(q)
d.q = (0.5, 0.5)
The problem is that Q does not change accordingly, and Q is the data used in the draw method.
To deal with this, one option is to compute Q every time the draw method is called.
But this is inefficient relative to computing Q once-off.
A better option is to use descriptors.
A solution from the quantecon library using descriptors that behaves as we desire can be found here.
Exercise 11.8.3
Recall our earlier discussion of the empirical cumulative distribution function.
Your task is to
1. Make the __call__ method more efficient using NumPy.
2. Add a method that plots the ECDF over [𝑎, 𝑏], where 𝑎 and 𝑏 are method parameters.
"""
Modifies ecdf.py from QuantEcon to add in a plot method
class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.
Parameters
----------
observations : array_like
An array of observations
Attributes
----------
observations : array_like
An array of observations
"""
Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated
Returns
-------
scalar(float)
Fraction of the sample less than x
"""
return np.mean(self.observations <= x)
Parameters
----------
a : scalar(float), optional(default=None)
Lower endpoint of the plot interval
b : scalar(float), optional(default=None)
Upper endpoint of the plot interval
"""
fig, ax = plt.subplots()
X = np.random.randn(1000)
F = ECDF(X)
F.plot(ax)
Exercise 11.8.4
Recall that broadcasting in Numpy can help us conduct element-wise operations on arrays with different number of
dimensions without using for loops.
In this exercise, try to use for loops to replicate the result of the following broadcasting operations.
Part1: Try to replicate this simple example using for loops and compare your results with the broadcasting operation
below.
np.random.seed(123)
x = np.random.randn(4, 4)
y = np.random.randn(4)
A = x / y
print(A)
Part2: Move on to replicate the result of the following broadcasting operation. Meanwhile, compare the speeds of
broadcasting and the for loop you implement.
import quantecon as qe
np.random.seed(123)
x = np.random.randn(1000, 100, 100)
y = np.random.randn(100)
qe.tic()
B = x / y
qe.toc()
0.012328624725341797
print(B)
np.random.seed(123)
x = np.random.randn(4, 4)
y = np.random.randn(4)
C = np.empty_like(x)
n = len(x)
for i in range(n):
for j in range(n):
C[i, j] = x[i, j] / y[j]
print(C)
print(np.array_equal(A, C))
True
Part 2 Solution
np.random.seed(123)
x = np.random.randn(1000, 100, 100)
y = np.random.randn(100)
qe.tic()
D = np.empty_like(x)
d1, d2, d3 = x.shape
for i in range(d1):
for j in range(d2):
for k in range(d3):
D[i, j, k] = x[i, j, k] / y[k]
qe.toc()
3.5061800479888916
Note that the for loop takes much longer than the broadcasting operation.
Compare the results to check your answer
print(D)
print(np.array_equal(B, D))
True
TWELVE
MATPLOTLIB
12.1 Overview
We’ve already generated quite a few figures in these lectures using Matplotlib.
Matplotlib is an outstanding graphics library, designed for scientific computing, with
• high-quality 2D and 3D plots
• output in all the usual formats (PDF, PNG, etc.)
• LaTeX integration
• fine-grained control over all aspects of presentation
• animation, etc.
Here’s the kind of easy example you might find in introductory treatments
187
Python Programming for Economics and Finance
This is simple and convenient, but also somewhat limited and un-Pythonic.
For example, in the function calls, a lot of objects get created and passed around without making themselves known to
the programmer.
Python programmers tend to prefer a more explicit style of programming (run import this in a code block and look
at the second line).
This leads us to the alternative, object-oriented Matplotlib API.
Here’s the code corresponding to the preceding figure using the object-oriented API
fig, ax = plt.subplots()
ax.plot(x, y, 'b-', linewidth=2)
plt.show()
12.2.3 Tweaks
fig, ax = plt.subplots()
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend()
plt.show()
We’ve also used alpha to make the line slightly transparent—which makes it look smoother.
The location of the legend can be changed by replacing ax.legend() with ax.legend(loc='upper cen-
ter').
fig, ax = plt.subplots()
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend(loc='upper center')
plt.show()
fig, ax = plt.subplots()
ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
plt.show()
fig, ax = plt.subplots()
ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
ax.set_yticks([-1, 0, 1])
ax.set_title('Test plot')
plt.show()
Matplotlib has a huge array of functions and features, which you can discover over time as you have need for them.
We mention just a few.
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()
num_rows, num_cols = 3, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 12))
for i in range(num_rows):
for j in range(num_cols):
m, s = uniform(-1, 1), uniform(1, 2)
x = norm.rvs(loc=m, scale=s, size=100)
axes[i, j].hist(x, alpha=0.6, bins=20)
t = f'$\mu = {m:.2}, \quad \sigma = {s:.2}$'
axes[i, j].set(title=t, xticks=[-4, 0, 4], yticks=[])
plt.show()
12.3.3 3D Plots
Perhaps you will find a set of customizations that you regularly use.
Suppose we usually prefer our axes to go through the origin, and to have a grid.
Here’s a nice example from Matthew Doty of how the object-oriented API can be used to build a custom subplots
function that implements these changes.
Read carefully through the code and see if you can follow what’s going on
def subplots():
"Custom subplots with axes through the origin"
fig, ax = plt.subplots()
ax.grid()
return fig, ax
print(plt.style.available)
↪colorblind10']
We can now use the plt.style.use() method to set the style sheet.
Let’s write a function that takes the name of a style sheet and draws different plots with the style
def draw_graphs(style='default'):
for i in range(3):
style_name = style.split('-')[0]
plt.suptitle(f'Style: {style_name}', fontsize=13)
plt.show()
draw_graphs(style='seaborn-v0_8')
draw_graphs(style='grayscale')
draw_graphs(style='ggplot')
draw_graphs(style='dark_background')
You can use the function to experiment with other styles in the list.
If you are interested, you can even create your own style sheets.
Parameters for your style sheets are stored in a dictionary-like variable plt.rcParams
print(plt.rcParams.keys())
There are many parameters you could set for your style sheets.
Set parameters for your style sheet by:
1. creating your own matplotlibrc file, or
2. updating values stored in the dictionary-like variable plt.rcParams
Let’s change the style of our overlaid density lines using the second method
# Update linewidth
plt.rcParams['lines.linewidth'] = 2
# You can also update many values at once using the update() method:
parameters = {
plt.rcParams.update(parameters)
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()
Apply the default style sheet again to change your style back to default
plt.style.use('default')
12.5 Exercises
Exercise 12.5.1
Plot the function
for θ in θ_vals:
ax.plot(x, f(x, θ))
plt.show()
THIRTEEN
SCIPY
13.1 Overview
SciPy builds on top of NumPy to provide common tools for scientific programming such as
• linear algebra
• numerical integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc
Like NumPy, SciPy is stable, mature and widely used.
Many SciPy routines are thin wrappers around industry-standard Fortran libraries such as LAPACK, BLAS, etc.
It’s not really necessary to “learn” SciPy as a whole.
A more common approach is to get some idea of what’s in the library and then look up documentation as required.
In this lecture, we aim only to highlight some useful parts of the package.
SciPy is a package that contains various tools that are built on top of NumPy, using its array data type and related
functionality.
In fact, when we import SciPy we also get NumPy, as can be seen from this excerpt the SciPy initialization file:
However, it’s more common and better practice to use NumPy functionality explicitly.
205
Python Programming for Economics and Finance
import numpy as np
a = np.identity(3)
13.3 Statistics
np.random.beta(5, 5, size=3)
This generates a draw from the distribution with the density function below when a, b = 5, 5
𝑥(𝑎−1) (1 − 𝑥)(𝑏−1)
𝑓(𝑥; 𝑎, 𝑏) = 1
(0 ≤ 𝑥 ≤ 1) (13.1)
∫0 𝑢(𝑎−1) (1 − 𝑢)(𝑏−1) 𝑑𝑢
Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this, we can use scipy.stats, which provides all of this functionality as well as random number generation in a
single consistent interface.
Here’s an example of usage
fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, q.pdf(grid), 'k-', linewidth=2)
plt.show()
The object q that represents the distribution has additional useful methods, including
0.26656768000000003
0.6339134834642708
q.mean()
0.5
The general syntax for creating these objects that represent distributions (of type rv_frozen) is
name = scipy.stats.distribution_name(shape_parameters, loc=c, scale=d)
Here distribution_name is one of the distribution names in scipy.stats.
The loc and scale parameters transform the original random variable 𝑋 into 𝑌 = 𝑐 + 𝑑𝑋.
fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
plt.show()
x = np.random.randn(200)
y = 2 * x + 0.1 * np.random.randn(200)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
gradient, intercept
(2.001368810940636, -0.008026829443346734)
fig, ax = plt.subplots()
ax.plot(x, f(x), label='$f(x)$')
ax.axhline(ls='--', c='k')
ax.set_xlabel('$x$', fontsize=12)
ax.set_ylabel('$f(x)$', fontsize=12)
ax.legend(fontsize=12)
plt.show()
13.4.1 Bisection
bisect(f, 0, 1)
0.408294677734375
bisect(f, 0, 1)
0.4082935042806639
0.40829350427935673
0.7001700000000279
brentq(f, 0, 1)
0.40829350427936706
Here the correct solution is found and the speed is better than bisection:
%timeit brentq(f, 0, 1)
21 μs ± 134 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit bisect(f, 0, 1)
83.3 μs ± 992 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
array(1.)
If you don’t get good results, you can always switch back to the brentq root finder, since the fixed point of a function 𝑓
is the root of 𝑔(𝑥) ∶= 𝑥 − 𝑓(𝑥).
13.5 Optimization
0.0
Multivariate local optimizers include minimize, fmin, fmin_powell, fmin_cg, fmin_bfgs, and fmin_ncg.
Constrained multivariate local optimizers include fmin_l_bfgs_b, fmin_tnc, fmin_cobyla.
See the documentation for details.
13.6 Integration
Most numerical integration methods work by computing the integral of an approximating polynomial.
The resulting error depends on how well the polynomial fits the integrand, which in turn depends on how “regular” the
integrand is.
In SciPy, the relevant module for numerical integration is scipy.integrate.
A good default for univariate integration is quad
0.33333333333333337
In fact, quad is an interface to a very standard numerical integration routine in the Fortran library QUADPACK.
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials.
There are other options for univariate integration—a useful one is fixed_quad, which is fast and hence works well
inside for loops.
There are also functions for multivariate integration.
See the documentation for more details.
We saw that NumPy provides a module for linear algebra called linalg.
SciPy also provides a module for linear algebra with the same name.
The latter is not an exact superset of the former, but overall it has more functionality.
We leave you to investigate the set of available routines.
13.8 Exercises
The first few exercises concern pricing a European call option under the assumption of risk neutrality. The price satisfies
𝑃 = 𝛽 𝑛 𝔼 max{𝑆𝑛 − 𝐾, 0}
where
1. 𝛽 is a discount factor,
2. 𝑛 is the expiry date,
3. 𝐾 is the strike price and
4. {𝑆𝑡 } is the price of the underlying asset at each time 𝑡.
For example, if the call option is to buy stock in Amazon at strike price 𝐾, the owner has the right (but not the obligation)
to buy 1 share in Amazon at price 𝐾 after 𝑛 days.
The payoff is therefore max{𝑆𝑛 − 𝐾, 0}
The price is the expectation of the payoff, discounted to current value.
Exercise 13.8.1
Suppose that 𝑆𝑛 has the log-normal distribution with parameters 𝜇 and 𝜎. Let 𝑓 denote the density of this distribution.
Then
∞
𝑃 = 𝛽𝑛 ∫ max{𝑥 − 𝐾, 0}𝑓(𝑥)𝑑𝑥
0
over the interval [0, 400] when μ, σ, β, n, K = 4, 0.25, 0.99, 10, 40.
Hint: From scipy.stats you can import lognorm and then use lognorm(x, σ, scale=np.exp(μ) to
get the density 𝑓.
def g(x):
return β**n * np.maximum(x - K, 0) * lognorm.pdf(x, σ, scale=np.exp(μ))
Exercise 13.8.2
In order to get the option price, compute the integral of this function numerically using quad from scipy.optimize.
Exercise 13.8.3
Try to get a similar result using Monte Carlo to compute the expectation term in the option price, rather than quad.
In particular, use the fact that if 𝑆𝑛1 , … , 𝑆𝑛𝑀 are independent draws from the lognormal distribution specified above, then,
1 𝑀
𝔼 max{𝑆𝑛 − 𝐾, 0} ≈ ∑ max{𝑆𝑛𝑚 − 𝐾, 0}
𝑀 𝑚=1
Set M = 10_000_000
M = 10_000_000
S = np.exp(μ + σ * np.random.randn(M))
return_draws = np.maximum(S - K, 0)
P = β**n * np.mean(return_draws)
print(f"The Monte Carlo option price is {P:3f}")
Exercise 13.8.4
In this lecture, we discussed the concept of recursive function calls.
Try to write a recursive implementation of the homemade bisection function described above.
Test it on the function (13.2).
0.408294677734375
FOURTEEN
PANDAS
In addition to what’s in Anaconda, this lecture will need the following libraries:
14.1 Overview
Just as NumPy provides the basic array data type plus core array operations, pandas
1. defines fundamental structures for working with data and
2. endows them with methods that facilitate operations such as
• reading in data
• adjusting indices
• working with dates and time series
• sorting, grouping, re-ordering and general data munging1
• dealing with missing values, etc., etc.
More sophisticated statistical functionality is left to other packages, such as statsmodels and scikit-learn, which are built
on top of pandas.
This lecture will provide a basic introduction to pandas.
Throughout the lecture, we will assume that the following imports have taken place
1 Wikipedia defines munging as cleaning data from one raw form into a structured, purged one.
219
Python Programming for Economics and Finance
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
Two important data types defined by pandas are Series and DataFrame.
You can think of a Series as a “column” of data, such as a collection of observations on a single variable.
A DataFrame is a two-dimensional object for storing related columns of data.
14.2 Series
0 -0.356705
1 -0.778773
2 -0.121393
3 -0.019880
Name: daily returns, dtype: float64
Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the values being daily returns on
their shares.
Pandas Series are built on top of NumPy arrays and support many similar operations
s * 100
0 -35.670542
1 -77.877261
2 -12.139258
3 -1.988006
Name: daily returns, dtype: float64
np.abs(s)
0 0.356705
1 0.778773
2 0.121393
3 0.019880
Name: daily returns, dtype: float64
s.describe()
count 4.000000
mean -0.319188
std 0.337310
min -0.778773
25% -0.462222
50% -0.239049
75% -0.096014
max -0.019880
Name: daily returns, dtype: float64
AMZN -0.356705
AAPL -0.778773
MSFT -0.121393
GOOG -0.019880
Name: daily returns, dtype: float64
Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction that the items in the dictionary
all have the same type—in this case, floats).
In fact, you can use much of the same syntax as Python dictionaries
s['AMZN']
-0.35670542141932043
s['AMZN'] = 0
s
AMZN 0.000000
AAPL -0.778773
MSFT -0.121393
GOOG -0.019880
Name: daily returns, dtype: float64
'AAPL' in s
True
14.3 DataFrames
While a Series is a single column of data, a DataFrame is several columns, one for each variable.
In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet.
Thus, it is a powerful tool for representing and analyzing data that are naturally organized into rows and columns, often
with descriptive indexes for individual rows and individual columns.
Let’s look at an example that reads data from the CSV file pandas/data/test_pwt.csv, which is taken from the
Penn World Tables.
The dataset contains the following indicators
We’ll read this in from a URL using the pandas function read_csv.
df = pd.read_csv('https://raw.githubusercontent.com/QuantEcon/lecture-python-
↪programming/master/source/_static/lecture_specific/pandas/data/test_pwt.csv')
type(df)
pandas.core.frame.DataFrame
df
cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068
In practice, one thing that we do all the time is to find, select and work with a subset of the data of our interests.
We can select particular rows using standard Python array slicing notation
df[2:5]
cc cg
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
To select columns, we can pass a list containing the names of the desired columns represented as strings
df[['country', 'tcgdp']]
country tcgdp
0 Argentina 2.950722e+05
1 Australia 5.418047e+05
2 India 1.728144e+06
3 Israel 1.292539e+05
4 Malawi 5.026222e+03
5 South Africa 2.272424e+05
6 United States 9.898700e+06
7 Uruguay 2.525596e+04
To select both rows and columns using integers, the iloc attribute should be used with the format .iloc[rows,
columns].
df.iloc[2:5, 0:4]
To select rows and columns using a mixture of integers and labels, the loc attribute can be used in a similar way
country tcgdp
2 India 1.728144e+06
3 Israel 1.292539e+05
4 Malawi 5.026222e+03
Instead of indexing rows and columns using integers and names, we can also obtain a sub-dataframe of our interests that
satisfies certain (potentially complicated) conditions.
This section demonstrates various ways to do that.
The most straightforward way is with the [] operator.
cc cg
0 75.716805 5.578804
2 64.575551 14.072206
5 72.718710 5.726546
6 72.347054 6.032454
To understand what is going on here, notice that df.POP >= 20000 returns a series of boolean values.
0 True
1 False
2 True
3 False
4 False
5 True
6 True
7 False
Name: POP, dtype: bool
In this case, df[___] takes a series of boolean values and only returns rows with the True values.
Take one more example,
cc cg
2 64.575551 14.072206
5 72.718710 5.726546
However, there is another way of doing the same thing, which can be slightly faster for large dataframes, with more natural
syntax.
cc cg
0 75.716805 5.578804
2 64.575551 14.072206
5 72.718710 5.726546
6 72.347054 6.032454
cc cg
2 64.575551 14.072206
5 72.718710 5.726546
cc cg
4 74.707624 11.658954
7 78.978740 5.108068
cc cg
4 74.707624 11.658954
7 78.978740 5.108068
For example, we can use the conditioning to select the country with the largest household consumption - gdp share cc.
df.loc[df.cc == max(df.cc)]
cg
7 5.108068
When we only want to look at certain columns of a selected sub-dataframe, we can use the above conditions with the
.loc[__ , __] command.
The first argument takes the condition, while the second argument takes a list of columns we want to return.
df.loc[(df.cc + df.cg >= 80) & (df.POP <= 20000), ['country', 'year', 'POP']]
df_subset.to_csv('pwt_subset.csv', index=False)
year 2.000000e+03
POP 1.006300e+06
XRAT 5.954381e+01
tcgdp 9.898700e+06
cc 7.897874e+01
cg 1.407221e+01
dtype: float64
This line of code applies the max function to all selected columns.
lambda function is often used with df.apply() method
A trivial example is to return itself for each row in the dataframe
cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068
complexCondition = df.apply(
lambda row: row.POP > 40000 if row.country in ['Argentina', 'India', 'South Africa
↪'] else row.POP < 20000,
df.apply() here returns a series of boolean values rows that satisfies the condition specified in the if-else statement.
In addition, it also defines a subset of variables of interest.
complexCondition
(0 False
1 True
2 True
3 True
4 True
5 True
6 False
7 True
dtype: bool,
['country', 'year', 'POP', 'XRAT', 'tcgdp'])
df.loc[complexCondition]
The ability to make changes in dataframes is important to generate a clean dataset for future analysis.
1. We can use df.where() conveniently to “keep” the rows we have selected and replace the rest rows with any other
values
cc cg
0 75.716805 5.578804
1 False False
2 64.575551 14.072206
3 False False
4 False False
5 72.71871 5.726546
6 72.347054 6.032454
7 False False
2. We can simply use .loc[] to specify the column that we want to modify, and assign values
cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 NaN
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068
def update_row(row):
# modify POP
row.POP = np.nan if row.POP<= 10000 else row.POP
# modify XRAT
row.XRAT = row.XRAT / 10
return row
df.apply(update_row, axis=1)
cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 NaN
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068
4. We can use the .applymap() method to modify all individual entries in the dataframe altogether.
cg
0 5.58
1 6.72
2 NaN
3 10.27
4 11.66
5 5.73
6 6.03
7 5.11
df
tcgdp cc cg
0 2.950722e+05 75.716805 5.578804
1 5.418047e+05 67.759026 6.720098
2 1.728144e+06 64.575551 NaN
3 1.292539e+05 64.436451 10.266688
4 5.026222e+03 74.707624 11.658954
5 2.272424e+05 NaN 5.726546
6 9.898700e+06 72.347054 6.032454
7 2.525596e+04 78.978740 5.108068
The zip() function here creates pairs of values from the two lists (i.e. [0,3], [3,4] …)
We can use the .applymap() method again to replace all missing values with 0
df.applymap(replace_nan)
df.applymap(replace_nan)
tcgdp cc cg
0 2.950722e+05 75.716805 5.578804
1 5.418047e+05 67.759026 6.720098
2 1.728144e+06 64.575551 0.000000
3 1.292539e+05 64.436451 10.266688
4 5.026222e+03 74.707624 11.658954
5 2.272424e+05 0.000000 5.726546
6 9.898700e+06 72.347054 6.032454
7 2.525596e+04 78.978740 5.108068
df = df.fillna(df.iloc[:,2:8].mean())
df
tcgdp cc cg
0 2.950722e+05 75.716805 5.578804
1 5.418047e+05 67.759026 6.720098
2 1.728144e+06 64.575551 7.298802
3 1.292539e+05 64.436451 10.266688
4 5.026222e+03 74.707624 11.658954
5 2.272424e+05 71.217322 5.726546
6 9.898700e+06 72.347054 6.032454
7 2.525596e+04 78.978740 5.108068
Missing value imputation is a big area in data science involving various machine learning techniques.
There are also more advanced tools in python to impute missing values.
Let’s imagine that we’re only interested in the population (POP) and total GDP (tcgdp).
One way to strip the data frame df down to only these variables is to overwrite the dataframe using the selection method
described above
Here the index 0, 1,..., 7 is redundant because we can use the country names as an index.
To do this, we set the index to be the country variable in the dataframe
df = df.set_index('country')
df
POP tcgdp
country
Argentina 1.962465e+05 2.950722e+05
Australia 1.905319e+04 5.418047e+05
India 1.006300e+06 1.728144e+06
Israel 6.114570e+03 1.292539e+05
Malawi 1.180150e+04 5.026222e+03
South Africa 4.506410e+04 2.272424e+05
United States 2.821720e+05 9.898700e+06
Uruguay 3.219793e+03 2.525596e+04
Next, we’re going to add a column showing real GDP per capita, multiplying by 1,000,000 as we go because total GDP
is in millions
One of the nice things about pandas DataFrame and Series objects is that they have methods for plotting and visu-
alization that work through Matplotlib.
For example, we can easily generate a bar plot of GDP per capita
ax = df['GDP percap'].plot(kind='bar')
ax.set_xlabel('country', fontsize=12)
ax.set_ylabel('GDP per capita', fontsize=12)
plt.show()
At the moment the data frame is ordered alphabetically on the countries—let’s change it to GDP per capita
ax = df['GDP percap'].plot(kind='bar')
ax.set_xlabel('country', fontsize=12)
ax.set_ylabel('GDP per capita', fontsize=12)
plt.show()
One option is to use requests, a standard Python library for requesting data over the Internet.
To begin, try the following code on your computer
r = requests.get('https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&
↪chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&
↪recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1318&nt=0&thu=0&trc=0&show_
↪legend=yes&show_axis_titles=yes&show_tooltip=yes&id=UNRATE&scale=left&cosd=1948-01-
↪01&coed=2024-06-01&line_color=%234572a7&link_values=false&line_style=solid&mark_
↪type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&
↪fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2024-07-29&revision_
↪date=2024-07-29&nd=1948-01-01')
url = 'https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_
↪type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&
↪recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1318&nt=0&thu=0&trc=0&show_
↪legend=yes&show_axis_titles=yes&show_tooltip=yes&id=UNRATE&scale=left&cosd=1948-01-
↪01&coed=2024-06-01&line_color=%234572a7&link_values=false&line_style=solid&mark_
↪type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&
'DATE,UNRATE'
source[1]
'1948-01-01,3.4'
source[2]
'1948-02-01,3.8'
We could now write some additional code to parse this text and store it as an array.
But this is unnecessary — pandas’ read_csv function can handle the task for us.
We use parse_dates=True so that pandas recognizes our dates column, allowing for simple date filtering
The data has been read into a pandas DataFrame called data that we can now manipulate in the usual way
type(data)
pandas.core.frame.DataFrame
UNRATE
DATE
1948-01-01 3.4
1948-02-01 3.8
1948-03-01 4.0
1948-04-01 3.9
1948-05-01 3.5
pd.set_option('display.precision', 1)
data.describe() # Your output might differ slightly
UNRATE
count 918.0
mean 5.7
std 1.7
min 2.5
25% 4.4
(continues on next page)
We can also plot the unemployment rate from 2006 to 2012 as follows
The maker of pandas has also authored a library called pandas_datareader that gives programmatic access to many data
sources straight from the Jupyter notebook.
While some sources require an access key, many of the most important (e.g., FRED, OECD, EUROSTAT and the World
Bank) are free to use.
We will also use yfinance to fetch data from Yahoo finance in the exercises.
For now let’s work through one example of downloading and plotting data — this time from the World Bank.
Note: There are also other python libraries available for working with world bank data such as wbgapi
The World Bank collects and organizes data on a huge range of indicators.
For example, here’s some data on government debt as a ratio to GDP.
The next code example fetches the data for you and plots time series for the US and Australia
ind = govt_debt.index.droplevel(-1)
govt_debt.index = ind
ax = govt_debt.plot(lw=2)
ax.set_xlabel('year', fontsize=12)
plt.title("Government Debt to GDP (%)")
plt.show()
The documentation provides more details on how to access various data sources.
14.5 Exercises
Exercise 14.5.1
With these imports:
import datetime as dt
import yfinance as yf
Write a program to calculate the percentage price change over 2021 for the following shares:
def read_data(ticker_list,
start=dt.datetime(2021, 1, 1),
end=dt.datetime(2021, 12, 31)):
"""
This function reads in closing price data from Yahoo
for each tick in the ticker_list.
"""
ticker = pd.DataFrame()
closing_prices = prices['Close']
ticker[tick] = closing_prices
return ticker
ticker = read_data(ticker_list)
Complete the program to plot the result as a bar graph like this one:
INTC 6.9
MSFT 57.2
IBM 18.7
BHP -10.5
TM 20.1
AAPL 38.6
AMZN 5.8
C 3.6
QCOM 25.3
KO 14.9
GOOG 69.0
dtype: float64
Alternatively you can use an inbuilt method pct_change and configure it to perform the correct calculation using
periods argument.
INTC 6.9
MSFT 57.2
IBM 18.7
BHP -10.5
TM 20.1
AAPL 38.6
AMZN 5.8
C 3.6
QCOM 25.3
KO 14.9
GOOG 69.0
Name: 2021-12-30 00:00:00, dtype: float64
price_change.sort_values(inplace=True)
price_change = price_change.rename(index=ticker_list)
fig, ax = plt.subplots(figsize=(10,8))
ax.set_xlabel('stock', fontsize=12)
ax.set_ylabel('percentage change in price', fontsize=12)
price_change.plot(kind='bar', ax=ax)
plt.show()
/tmp/ipykernel_2301/232489783.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
price_change.sort_values(inplace=True)
Exercise 14.5.2
Using the method read_data introduced in Exercise 14.5.1, write a program to obtain year-on-year percentage change
for the following indices:
Complete the program to show summary statistics and plot the result as a time series graph like this one:
indices_data = read_data(
indices_list,
start=dt.datetime(1971, 1, 1), #Common Start Date
end=dt.datetime(2021, 12, 31)
)
Then, extract the first and last set of prices per year as DataFrames and calculate the yearly returns such as:
yearly_returns = pd.DataFrame()
yearly_returns
Next, you can obtain summary statistics by using the method describe.
yearly_returns.describe()
plt.tight_layout()
FIFTEEN
In addition to what’s in Anaconda, this lecture will need the following libraries:
15.1 Overview
249
Python Programming for Economics and Finance
We will read in a dataset from the OECD of real minimum wages in 32 countries and assign it to realwage.
The dataset can be accessed with the following link:
url1 = 'https://raw.githubusercontent.com/QuantEcon/lecture-python/master/source/_
↪static/lecture_specific/pandas_panel/realwage.csv'
import pandas as pd
realwage = pd.read_csv(url1)
The data is currently in long format, which is difficult to analyze when there are several dimensions to the data.
We will use pivot_table to create a wide format panel, with a MultiIndex to handle higher dimensional data.
pivot_table arguments should specify the data (values), the index, and the columns we want in our resulting
dataframe.
By passing a list in columns, we can create a MultiIndex in our column axis
realwage = realwage.pivot_table(values='value',
index='Time',
columns=['Country', 'Series', 'Pay period'])
realwage.head()
Country Australia \
Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
Time
(continues on next page)
Country ... \
Series In 2015 constant prices at 2015 USD exchange rates ...
Pay period Annual ...
Time ...
2006-01-01 23,826.64 ...
2007-01-01 24,616.84 ...
2008-01-01 24,185.70 ...
2009-01-01 24,496.84 ...
2010-01-01 24,373.76 ...
Country
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88
To more easily filter our time series data, later on, we will convert the index into a DateTimeIndex
realwage.index = pd.to_datetime(realwage.index)
type(realwage.index)
pandas.core.indexes.datetimes.DatetimeIndex
The columns contain multiple levels of indexing, known as a MultiIndex, with levels being ordered hierarchically
(Country > Series > Pay period).
A MultiIndex is the simplest and most flexible way to manage panel data in pandas
type(realwage.columns)
pandas.core.indexes.multi.MultiIndex
realwage.columns.names
Like before, we can select the country (the top level of our MultiIndex)
realwage['United States'].head()
Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to reshape our dataframe into a
format we need.
.stack() rotates the lowest level of the column MultiIndex to the row index (.unstack() works in the opposite
direction - try it out)
realwage.stack().head()
↪What's New notes for pandas 2.1.0 for details. Specify future_stack=True to␣
realwage.stack().head()
Country Australia \
Series In 2015 constant prices at 2015 USD PPPs
Time Pay period
2006-01-01 Annual 20,410.65
Hourly 10.33
2007-01-01 Annual 21,087.57
Hourly 10.67
2008-01-01 Annual 20,718.24
Country \
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 23,826.64
(continues on next page)
Country
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
2007-01-01 Annual 12,974.40
Hourly 6.24
2008-01-01 Annual 14,097.56
[5 rows x 64 columns]
We can also pass in an argument to select the level we would like to stack
realwage.stack(level='Country').head()
↪What's New notes for pandas 2.1.0 for details. Specify future_stack=True to␣
realwage.stack(level='Country').head()
realwage.loc['2015'].stack(level=(1, 2)).transpose().head()
↪What's New notes for pandas 2.1.0 for details. Specify future_stack=True to␣
realwage.loc['2015'].stack(level=(1, 2)).transpose().head()
Time 2015-01-01 \
Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
Country
Australia 21,715.53 10.99
Belgium 21,588.12 10.35
Brazil 4,628.63 2.00
Canada 16,536.83 7.95
Chile 6,633.56 2.80
Time
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Country
Australia 25,349.90 12.83
Belgium 20,753.48 9.95
Brazil 2,842.28 1.21
Canada 17,367.24 8.35
Chile 4,251.49 1.81
For the rest of lecture, we will work with a dataframe of the hourly real minimum wages across countries and time,
measured in 2015 US dollars.
To create our filtered dataframe (realwage_f), we can use the xs method to select values at lower levels in the
multiindex, while keeping the higher levels (countries in this case)
[5 rows x 32 columns]
Similar to relational databases like SQL, pandas has built in methods to merge datasets together.
Using country information from WorldData.info, we’ll add the continent of each country to realwage_f with the
merge function.
The dataset can be accessed with the following link:
url2 = 'https://raw.githubusercontent.com/QuantEcon/lecture-python/master/source/_
↪static/lecture_specific/pandas_panel/countries.csv'
[5 rows x 17 columns]
First, we’ll select just the country and continent variables from worlddata and rename the column to ‘Country’
Country Continent
0 Afghanistan Asia
1 Egypt Africa
2 Åland Islands Europe
3 Albania Europe
4 Algeria Africa
realwage_f.transpose().head()
Time 2016-01-01
Country
Australia 12.98
Belgium 9.76
Brazil 1.24
Canada 8.48
Chile 1.91
[5 rows x 11 columns]
We can use either left, right, inner, or outer join to merge our datasets:
• left join includes only countries from the left dataset
• right join includes only countries from the right dataset
• outer join includes countries that are in either the left and right datasets
• inner join includes only countries common to both the left and right datasets
By default, merge will use an inner join.
Here we will pass how='left' to keep all countries in realwage_f, but discard countries in worlddata that do
not have a corresponding data entry realwage_f.
This is illustrated by the red shading in the following diagram
We will also need to specify where the country name is located in each dataframe, which will be the key that is used to
merge the dataframes ‘on’.
Our ‘left’ dataframe (realwage_f.transpose()) contains countries in the index, so we set left_index=True.
Our ‘right’ dataframe (worlddata) contains countries in the ‘Country’ column, so we set right_on='Country'
[5 rows x 13 columns]
Countries that appeared in realwage_f but not in worlddata will have NaN in the Continent column.
To check whether this has occurred, we can use .isnull() on the continent column and filter the merged dataframe
merged[merged['Continent'].isnull()]
[3 rows x 13 columns]
merged['Country'].map(missing_continents)
17.00 NaN
23.00 NaN
32.00 NaN
100.00 NaN
38.00 NaN
108.00 NaN
41.00 NaN
225.00 NaN
53.00 NaN
58.00 NaN
45.00 NaN
68.00 NaN
233.00 NaN
86.00 NaN
88.00 NaN
91.00 NaN
NaN Asia
117.00 NaN
122.00 NaN
123.00 NaN
138.00 NaN
153.00 NaN
151.00 NaN
174.00 NaN
175.00 NaN
NaN Europe
NaN Europe
198.00 NaN
(continues on next page)
merged['Continent'] = merged['Continent'].fillna(merged['Country'].map(missing_
↪continents))
merged[merged['Country'] == 'Korea']
[1 rows x 13 columns]
We will also combine the Americas into a single continent - this will make our visualization nicer later on.
To do this, we will use .replace() and loop through a list of the continent values we want to replace
↪method.
The behavior will change in pandas 3.0. This inplace method will never work␣
↪because the intermediate object on which we are setting values always behaves as␣
↪a copy.
merged['Continent'].replace(to_replace=country,
Now that we have all the data we want in a single DataFrame, we will reshape it back into panel form with a Multi-
Index.
We should also ensure to sort the index using .sort_index() so that we can efficiently filter our dataframe later on.
2015-01-01 2016-01-01
Continent Country
America Brazil 1.21 1.24
Canada 8.35 8.48
Chile 1.81 1.91
Colombia 1.13 1.12
Costa Rica 2.56 2.63
[5 rows x 11 columns]
While merging, we lost our DatetimeIndex, as we merged columns that were not in datetime format
merged.columns
Now that we have set the merged columns as the index, we can recreate a DatetimeIndex using .to_datetime()
merged.columns = pd.to_datetime(merged.columns)
merged.columns = merged.columns.rename('Time')
merged.columns
The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and transpose merged
merged = merged.transpose()
merged.head()
[5 rows x 32 columns]
Grouping and summarizing data can be particularly useful for understanding large panel datasets.
A simple way to summarize data is to call an aggregation method on the dataframe, such as .mean() or .max().
For example, we can calculate the average real minimum wage for each country over the period 2006 to 2016 (the default
is to aggregate over rows)
merged.mean().head(10)
Continent Country
America Brazil 1.09
Canada 7.82
Chile 1.62
Colombia 1.07
Costa Rica 2.53
Mexico 0.53
United States 7.15
Asia Israel 5.95
Japan 6.18
Korea 4.22
dtype: float64
Using this series, we can plot the average real minimum wage over the past decade for each country in our data set
merged.mean().sort_values(ascending=False).plot(kind='bar',
title="Average real minimum wage 2006␣
↪- 2016")
plt.show()
Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum wage for all countries over
time)
merged.mean(axis=1).head()
Time
2006-01-01 4.69
2007-01-01 4.84
2008-01-01 4.90
2009-01-01 5.08
2010-01-01 5.11
dtype: float64
merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 - 2016')
(continues on next page)
We can also specify a level of the MultiIndex (in the column axis) to aggregate over.
In the case of groupby we need to use .T to transpose the columns into rows as pandas has deprecated the use of
axis=1 in the groupby method.
merged.T.groupby(level='Continent').mean().head()
Time 2016-01-01
Continent
America 3.30
Asia 5.44
Australia 11.73
Europe 5.57
We can plot the average minimum wages in each continent as a time series
merged.T.groupby(level='Continent').mean().T.plot()
plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
merged.stack().describe()
↪What's New notes for pandas 2.1.0 for details. Specify future_stack=True to␣
merged.stack().describe()
• combine: the results of the function calls are combined into a new data structure
The groupby method achieves the first step of this process, creating a new DataFrameGroupBy object with data
split into groups.
Let’s split merged by continent again, this time using the groupby function, and name the resulting object grouped
grouped = merged.T.groupby(level='Continent')
grouped
Calling an aggregation method on the object applies the function to each group, the results of which are combined in a
new data structure.
For example, we can return the number of countries in our dataset for each continent using .size().
In this case, our new data structure is a Series
grouped.size()
Continent
America 7
Asia 4
Europe 19
dtype: int64
Calling .get_group() to return just the countries in a single group, we can create a kernel density estimate of the
distribution of real minimum wages in 2016 for each continent.
grouped.groups.keys() will return the keys from the groupby object
continents = grouped.groups.keys()
This lecture has provided an introduction to some of pandas’ more advanced features, including multiindices, merging,
grouping and plotting.
Other tools that may be useful in panel data analysis include xarray, a python package that extends pandas to N-dimensional
data structures.
15.6 Exercises
Exercise 15.6.1
In these exercises, you’ll work with a dataset of employment rates in Europe by age and sex from Eurostat.
The dataset can be accessed with the following link:
url3 = 'https://raw.githubusercontent.com/QuantEcon/lecture-python/master/source/_
↪static/lecture_specific/pandas_panel/employ.csv'
Reading in the CSV file returns a panel dataset in long format. Use .pivot_table() to construct a wide format
dataframe with a MultiIndex in the columns.
Start off by exploring the dataframe and the variables available in the MultiIndex levels.
employ = pd.read_csv(url3)
employ = employ.pivot_table(values='Value',
index=['DATE'],
columns=['UNIT','AGE', 'SEX', 'INDIC_EM', 'GEO'])
employ.index = pd.to_datetime(employ.index) # ensure that dates are datetime format
employ.head()
UNIT
AGE
SEX
INDIC_EM
GEO United Kingdom
DATE
2007-01-01 4,131.00
2008-01-01 4,204.00
2009-01-01 4,193.00
2010-01-01 4,186.00
2011-01-01 4,164.00
This is a large dataset so it is useful to explore the levels and variables available
employ.columns.names
Exercise 15.6.2
Filter the above dataframe to only include employment as a percentage of ‘active population’.
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex.
employ.columns = employ.columns.swaplevel(0,-1)
employ = employ.sort_index(axis=1)
We need to get rid of a few items in GEO which are not countries.
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in GEO that begin with ‘Euro’
geo_list = employ.columns.get_level_values('GEO').unique().tolist()
countries = [x for x in geo_list if not x.startswith('Euro')]
employ = employ[countries]
employ.columns.get_level_values('GEO').unique()
Select only percentage employed in the active population from the dataframe
GEO
AGE
SEX Total
DATE
2007-01-01 59.30
2008-01-01 59.80
2009-01-01 60.30
2010-01-01 60.00
2011-01-01 59.70
box = employ_f.loc['2015'].unstack().reset_index()
sns.boxplot(x="AGE", y=0, hue="SEX", data=box, palette=("husl"), showfliers=False)
plt.xlabel('')
plt.xticks(rotation=35)
plt.ylabel('Percentage of population (%)')
plt.title('Employment in Europe (2015)')
plt.legend(bbox_to_anchor=(1,0.5))
plt.show()
SIXTEEN
SYMPY
16.1 Overview
Unlike numerical libraries that deal with values, SymPy focuses on manipulating mathematical symbols and expressions
directly.
SymPy provides a wide range of features including
• symbolic expression
• equation solving
• simplification
• calculus
• matrices
• discrete math, etc.
These functions make SymPy a popular open-source alternative to other proprietary symbolic computational software
such as Mathematica.
In this lecture, we will explore some of the functionality of SymPy and demonstrate how to use basic SymPy functions
to solve economic models.
Let’s first import the library and initialize the printer for symbolic output
import numpy as np
import matplotlib.pyplot as plt
273
Python Programming for Economics and Finance
16.3.1 Symbols
x, y, z = symbols('x y z')
16.3.2 Expressions
expr = (x+y) ** 2
expr
2
(𝑥 + 𝑦)
expand_expr = expand(expr)
expand_expr
𝑥2 + 2𝑥𝑦 + 𝑦2
and factorize it back to the factored form with the factor function
factor(expand_expr)
2
(𝑥 + 𝑦)
solve(expr)
[{𝑥 ∶ −𝑦}]
(𝑥 + 𝑦)2 = 0
Note: Solvers is an important module with tools to solve different types of equations.
There are a variety of solvers available in SymPy depending on the nature of the problem.
16.3.3 Equations
eq = Eq(expr, 0)
eq
2
(𝑥 + 𝑦) = 0
Solving this equation with respect to 𝑥 gives the same output as solving the expression directly
solve(eq, x)
[−𝑦]
eq = Eq(expr, 1)
solve(eq, x)
[1 − 𝑦, −𝑦 − 1]
solve function can also combine multiple equations together and solve a system of equations
eq2 = Eq(x, y)
eq2
𝑥=𝑦
1 1 1 1
[(− , − ) , ( , )]
2 2 2 2
expr_sub = expr.subs(x, y)
expr_sub
4𝑦2
solve(Eq(expr_sub, 1))
1 1
[− , ]
2 2
Below is another example equation with the symbol x and functions sin, cos, and tan using the Eq function
# Create an equation
eq = Eq(cos(x) / (tan(x)/sin(x)), 0)
eq
# Simplify an expression
simplified_expr = simplify(eq)
simplified_expr
cos2 (𝑥) = 0
𝜋 𝜋
[− , ]
2 2
SymPy can also handle more complex equations involving trigonometry and complex numbers.
We demonstrate this using Euler’s formula
simplify(euler)
𝑒𝑖𝑥
If you are interested, we encourage you to read the lecture on trigonometry and complex numbers.
𝑘∗ = 𝑠𝐴(𝑘∗ )𝛼 + (1 − 𝛿)𝑘∗
𝛼
𝐴 (𝑘∗ ) 𝑠 + 𝑘∗ (1 − 𝛿) = 𝑘∗
solve(solow, k)
1
− 𝛼−1
𝐴𝑠
[( ) ]
𝛿
SymPy also allows users to define inequalities and set operators and provides a wide range of operations.
𝑦 5𝑦
𝑥≤5− ∧ 𝑥 ≤ 15 − ∧ −∞ < 𝑥
2 2
And(2*x + 5*y <= 30, x > 0)
2𝑥 + 5𝑦 ≤ 30 ∧ 𝑥 > 0
16.3.5 Series
Series are widely used in economics and statistics, from asset pricing to the expectation of discrete random variables.
We can construct a simple series of summations using Sum function and Indexed symbols
x, y, i, j = symbols("x y i j")
sum_xy = Sum(Indexed('x', i)*Indexed('y', j),
(i, 0, 3),
(j, 0, 3))
sum_xy
∑ 𝑥𝑖 𝑦𝑗
0≤𝑖≤3
0≤𝑗≤3
36
D = symbols('D_0')
r = Symbol('r', positive=True)
Dt = Sum('(1 - r)^i * D_0', (i, 0, oo))
Dt
∞
𝑖
∑ 𝐷0 (1 − 𝑟)
𝑖=0
Dt.doit()
1
for |𝑟 − 1| < 1
𝐷0 ({ 𝑟 ∞ 𝑖 )
∑𝑖=0 (1 − 𝑟) otherwise
simplify(Dt.doit())
𝐷0
for 𝑟 > 0 ∧ 𝑟 < 2
{ 𝑟 ∞ 𝑖
𝐷0 ∑𝑖=0 (1 − 𝑟) otherwise
𝜆𝑥 𝑒−𝜆
𝑥!
We can verify if the sum of probabilities for all possible values equals 1:
∞
∑ 𝑓(𝑥) = 1
𝑥=0
SymPy allows us to perform various calculus operations, such as limits, differentiation, and integration.
16.4.1 Limits
We can compute limits for a given expression using the limit function
# Define an expression
f = x**2 / (x-1)
16.4.2 Derivatives
𝑥2 2𝑥
− 2
+
(𝑥 − 1) 𝑥−1
16.4.3 Integrals
We can compute definite and indefinite integrals using the integrate function
1
𝑥+
𝑥−1
Let’s use this function to compute the moment-generating function of exponential distribution with the probability density
function:
λ = Symbol('lambda', positive=True)
x = Symbol('x', positive=True)
pdf = λ * exp(-λ*x)
pdf
𝜆𝑒−𝜆𝑥
t = Symbol('t', positive=True)
moment_t = integrate(exp(t*x) * pdf, (x, 0, oo))
simplify(moment_t)
𝜆
⎧ 𝜆−𝑡 𝜆
{ ∞ for 𝜆 > 𝑡 ∧ 𝑡 ≠1
⎨𝜆 ∫ 𝑒𝑥(−𝜆+𝑡) 𝑑𝑥 otherwise
{
⎩ 0
Note that we can also use Stats module to compute the moment
X = Exponential(x, λ)
moment(X, 1)
1
𝜆
E(X**t)
𝜆−𝑡 Γ (𝑡 + 1)
Using the integrate function, we can derive the cumulative density function of the exponential distribution with
𝜆 = 0.5
0.5𝑒−0.5𝑥
0.864664716763387
cdf(X, 1/2)
1 − 𝑒−𝑧𝜆 for 𝑧 ≥ 0
(𝑧 ↦ { )
0 otherwise
1 − 𝑒−4𝜆
# Substitute λ
λ_cdf.subs({λ: 1/2})
0.864664716763387
16.5 Plotting
Imagine a pure exchange economy with two people (𝑎 and 𝑏) and two goods recorded as proportions (𝑥 and 𝑦).
They can trade goods with each other according to their preferences.
Assume that the utility functions of the consumers are given by
𝑢𝑎 (𝑥, 𝑦) = 𝑥𝛼 𝑦1−𝛼
u_a
𝑥𝛼 𝑦1−𝛼
u_b
𝛽 1−𝛽
(1 − 𝑥) (1 − 𝑦)
1−𝛽 𝛽−1
𝑦𝑦1−𝛼 𝑦𝛼−1 𝛼 𝛽 (1 − 𝑦) (1 − 𝑦) (1 − 𝑦)
=−
𝑥 (1 − 𝛼) (1 − 𝑥) (𝛽 − 1)
𝑥𝛽 (𝛼 − 1)
𝑥𝛼 − 𝑥𝛽 + 𝛼𝛽 − 𝛼
Let’s compute the Pareto optimal allocations of the economy (contract curves) with 𝛼 = 𝛽 = 0.5 using SymPy
1.0𝑥
We can use this result to visualize more contract curves under different parameters
We invite you to play with the parameters and see how the contract curves change and think about the following two
questions:
• Can you think of a way to draw the same graph using numpy?
• How difficult will it be to write a numpy implementation?
16.7 Exercises
Exercise 16.7.1
L’Hôpital’s rule states that for two functions 𝑓(𝑥) and 𝑔(𝑥), if lim𝑥→𝑎 𝑓(𝑥) = lim𝑥→𝑎 𝑔(𝑥) = 0 or ±∞, then
𝑓(𝑥) 𝑓 ′ (𝑥)
lim = lim ′
𝑥→𝑎 𝑔(𝑥) 𝑥→𝑎 𝑔 (𝑥)
f_upper = y**x - 1
f_lower = x
f = f_upper/f_lower
f
𝑦𝑥 − 1
𝑥
lim = limit(f, x, 0)
lim
log (𝑦)
log (𝑦)
Exercise 16.7.2
Maximum likelihood estimation (MLE) is a method to estimate the parameters of a statistical model.
It usually involves maximizing a log-likelihood function and solving the first-order derivative.
The binomial distribution is given by
𝑛!
𝑓(𝑥; 𝑛, 𝜃) = 𝜃𝑥 (1 − 𝜃)𝑛−𝑥
𝑥!(𝑛 − 𝑥)!
n, x, θ = symbols('n x θ')
𝑛!
𝑥! (𝑛 − 𝑟)!
𝑛−𝑥
𝜃𝑥 (1 − 𝜃) 𝑛!
𝑥! (𝑛 − 𝑟)!
Now we compute the log-likelihood function and solve for the result
log_bino_dist = log(bino_dist)
𝑥
𝑛
293
CHAPTER
SEVENTEEN
NUMBA
In addition to what’s in Anaconda, this lecture will need the following libraries:
Please also make sure that you have the latest version of Anaconda, since old versions are a common source of errors.
Let’s start with some imports:
import numpy as np
import quantecon as qe
import matplotlib.pyplot as plt
17.1 Overview
In an earlier lecture we learned about vectorization, which is one method to improve speed and efficiency in numerical
work.
Vectorization involves sending array processing operations in batch to efficient low-level code.
However, as discussed previously, vectorization has several weaknesses.
One is that it is highly memory-intensive when working with large amounts of data.
Another is that the set of algorithms that can be entirely vectorized is not universal.
In fact, for some algorithms, vectorization is ineffective.
Fortunately, a new Python library called Numba solves many of these problems.
It does so through something called just in time (JIT) compilation.
The key idea is to compile functions to native machine code instructions on the fly.
When it succeeds, the compiled code is extremely fast.
Numba is specifically designed for numerical work and can also do other tricks such as multithreading.
Numba will be a key part of our lectures — especially those lectures involving dynamic programming.
This lecture introduces the main ideas.
295
Python Programming for Economics and Finance
As stated above, Numba’s primary use is compiling functions to fast native machine code during runtime.
17.2.1 An Example
Let’s consider a problem that is difficult to vectorize: generating the trajectory of a difference equation given an initial
condition.
We will take the difference equation to be the quadratic map
𝑥𝑡+1 = 𝛼𝑥𝑡 (1 − 𝑥𝑡 )
α = 4.0
Here’s the plot of a typical trajectory, starting from 𝑥0 = 0.1, with 𝑡 on the x-axis
x = qm(0.1, 250)
fig, ax = plt.subplots()
ax.plot(x, 'b-', lw=2, alpha=0.8)
ax.set_xlabel('$t$', fontsize=12)
ax.set_ylabel('$x_{t}$', fontsize = 12)
plt.show()
qm_numba = njit(qm)
n = 10_000_000
qe.tic()
qm(0.1, int(n))
time1 = qe.toc()
qe.tic()
qm_numba(0.1, int(n))
time2 = qe.toc()
qe.tic()
qm_numba(0.1, int(n))
time3 = qe.toc()
138.38702623245348
This kind of speed gain is huge relative to how simple and clear the implementation is.
Numba attempts to generate fast machine code using the infrastructure provided by the LLVM Project.
It does this by inferring type information on the fly.
(See our earlier lecture on scientific computing for a discussion of types.)
The basic idea is this:
• Python is very flexible and hence we could call the function qm with many types.
– e.g., x0 could be a NumPy array or a list, n could be an integer or a float, etc.
• This makes it hard to pre-compile the function.
• However, when we do actually call the function, say by executing qm(0.5, 10), the types of x0 and n become
clear.
• Moreover, the types of other variables in qm can be inferred once the input is known.
• So the strategy of Numba and other JIT compilers is to wait until this moment, and then compile the function.
That’s why it is called “just-in-time” compilation.
Note that, if you make the call qm(0.5, 10) and then follow it with qm(0.9, 20), compilation only takes place
on the first call.
The compiled code is then cached and recycled as required.
In the code above we created a JIT compiled version of qm via the call
qm_numba = njit(qm)
To target a function for JIT compilation we can put @njit before the function definition.
Here’s what this looks like for qm
@njit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = α * x[t] * (1 - x[t])
return x
%%time
qm(0.1, 100_000)
CPU times: user 64.5 ms, sys: 4.02 ms, total: 68.5 ms
Wall time: 68.2 ms
Numba provides several arguments for decorators to accelerate computation and cache functions here.
In the following lecture on parallelization, we will discuss how to use the parallel argument to achieve automatic
parallelization.
@njit
def bootstrap(data, statistics, n):
bootstrap_stat = np.empty(n)
n = len(data)
for i in range(n_resamples):
(continues on next page)
def mean(data):
return np.mean(data)
#Error
try:
bootstrap(data, mean, n_resamples)
except Exception as e:
print(e)
@njit
def mean(data):
return np.mean(data)
CPU times: user 282 ms, sys: 27.9 ms, total: 310 ms
Wall time: 310 ms
bootstrap.signatures
The function bootstrap takes one float64 floating point array, one function called mean and an int64 integer.
Now let’s see what happens when we change the inputs.
Running it again with a larger integer for n and a different set of data does not change the signature of the function.
CPU times: user 519 ms, sys: 35.8 ms, total: 555 ms
Wall time: 556 ms
solow_data = [
('n', float64),
('s', float64),
('δ', float64),
('α', float64),
('z', float64),
('k', float64)
]
@jitclass(solow_data)
class Solow:
r"""
Implements the Solow growth model with the update rule
"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
z=2.0, # productivity
k=1.0): # current capital stock
def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)
def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))
First we specified the types of the instance data for the class in solow_data.
After that, targeting the class for JIT compilation only requires adding @jitclass(solow_data) before the class
definition.
When we call the methods in the class, the methods are compiled just like functions.
s1 = Solow()
s2 = Solow(k=8.0)
T = 60
fig, ax = plt.subplots()
17.6.1 Cython
Like Numba, Cython provides an approach to generating fast compiled code that can be used from Python.
As was the case with Numba, a key problem is the fact that Python is dynamically typed.
As you’ll recall, Numba solves this problem (where possible) by inferring type.
Cython’s approach is different — programmers add type definitions directly to their “Python” code.
As such, the Cython language can be thought of as Python with type definitions.
In addition to a language specification, Cython is also a language translator, transforming Cython code into optimized C
and C++ code.
Cython also takes care of building language extensions — the wrapper code that interfaces between the resulting compiled
code and Python.
While Cython has certain advantages, we generally find it both slower and more cumbersome than Numba.
If you are comfortable writing Fortran you will find it very easy to create extension modules from Fortran code using
F2Py.
F2Py is a Fortran-to-Python interface generator that is particularly simple to use.
Robert Johansson provides a nice introduction to F2Py, among other things.
Recently, a Jupyter cell magic for Fortran has been developed — you might want to give it a try.
17.7.1 Limitations
As we’ve seen, Numba needs to infer type information on all variables to generate fast machine-level instructions.
For simple routines, Numba infers types very well.
For larger ones, or for routines using external libraries, it can easily fail.
Hence, it’s prudent when using Numba to focus on speeding up small, time-critical snippets of code.
This will give you much better performance than blanketing your Python programs with @njit statements.
a = 1
@njit
def add_a(x):
return a + x
print(add_a(10))
11
a = 2
print(add_a(10))
11
Notice that changing the global had no effect on the value returned by the function.
When Numba compiles machine code for functions, it treats global variables as constants to ensure type stability.
17.8 Exercises
Exercise 17.8.1
Previously we considered how to approximate 𝜋 by Monte Carlo.
Use the same idea here, but make the code efficient using Numba.
Compare speed with and without Numba when the sample size is large.
@njit
def calculate_pi(n=1_000_000):
count = 0
for i in range(n):
u, v = uniform(0, 1), uniform(0, 1)
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
return area_estimate * 4 # dividing by radius**2
%time calculate_pi()
3.141168
%time calculate_pi()
3.144308
If we switch off JIT compilation by removing @njit, the code takes around 150 times as long on our machine.
So we get a speed gain of 2 orders of magnitude–which is huge–by adding four characters.
Exercise 17.8.2
In the Introduction to Quantitative Economics with Python lecture series you can learn all about finite-state Markov chains.
For now, let’s just concentrate on simulating a very simple example of such a chain.
Suppose that the volatility of returns on an asset can be in one of two regimes — high or low.
The transition probabilities across states are as follows
For example, let the period length be one day, and suppose the current state is high.
We see from the graph that the state tomorrow will be
• high with probability 0.8
• low with probability 0.2
Your task is to simulate a sequence of daily volatility states according to this rule.
Set the length of the sequence to n = 1_000_000 and start in the high state.
Implement a pure Python version and a Numba version, and compare speeds.
To test your code, evaluate the fraction of time that the chain spends in the low state.
If your code is correct, it should be about 2/3.
Hint:
• Represent the low state as 0 and the high state as 1.
• If you want to store integers in a NumPy array and then apply JIT compilation, use x = np.empty(n,
dtype=np.int_).
def compute_series(n):
x = np.empty(n, dtype=np.int_)
x[0] = 1 # Start in state 1
U = np.random.uniform(0, 1, size=n)
for t in range(1, n):
(continues on next page)
Let’s run this code and check that the fraction of time spent in the low state is about 0.666
n = 1_000_000
x = compute_series(n)
print(np.mean(x == 0)) # Fraction of time x is in state 0
0.666601
qe.tic()
compute_series(n)
qe.toc()
0.45053696632385254
compute_series_numba = njit(compute_series)
x = compute_series_numba(n)
print(np.mean(x == 0))
0.666707
qe.tic()
compute_series_numba(n)
qe.toc()
0.007447957992553711
EIGHTEEN
PARALLELIZATION
In addition to what’s in Anaconda, this lecture will need the following libraries:
18.1 Overview
The growth of CPU clock speed (i.e., the speed at which a single chain of logic can be run) has slowed dramatically in
recent years.
This is unlikely to change in the near future, due to inherent physical limitations on the construction of chips and circuit
boards.
Chip designers and computer programmers have responded to the slowdown by seeking a different path to fast execution:
parallelization.
Hardware makers have increased the number of cores (physical CPUs) embedded in each machine.
For programmers, the challenge has been to exploit these multiple CPUs by running many processes in parallel (i.e.,
simultaneously).
This is particularly important in scientific programming, which requires handling
• large amounts of data and
• CPU intensive simulations and other calculations.
In this lecture we discuss parallelization for scientific computing, with a focus on
1. the best tools for parallelization in Python and
2. how these tools can be applied to quantitative economic problems.
Let’s start with some imports:
import numpy as np
import quantecon as qe
import matplotlib.pyplot as plt
309
Python Programming for Economics and Finance
Large textbooks have been written on different approaches to parallelization but we will keep a tight focus on what’s most
useful to us.
We will briefly review the two main kinds of parallelization commonly used in scientific computing and discuss their pros
and cons.
18.2.1 Multiprocessing
Multiprocessing means concurrent execution of multiple processes using more than one processor.
In this context, a process is a chain of instructions (i.e., a program).
Multiprocessing can be carried out on one machine with multiple CPUs or on a collection of machines connected by a
network.
In the latter case, the collection of machines is usually called a cluster.
With multiprocessing, each process has its own memory space, although the physical memory chip might be shared.
18.2.2 Multithreading
Multithreading is similar to multiprocessing, except that, during execution, the threads all share the same memory space.
Native Python struggles to implement multithreading due to some legacy design features.
But this is not a restriction for scientific libraries like NumPy and Numba.
Functions imported from these libraries and JIT-compiled code run in low level execution environments where Python’s
legacy restrictions don’t apply.
Multithreading is more lightweight because most system and memory resources are shared by the threads.
In addition, the fact that multiple threads all access a shared pool of memory is extremely convenient for numerical
programming.
On the other hand, multiprocessing is more flexible and can be distributed across clusters.
For the great majority of what we do in these lectures, multithreading will suffice.
Actually, you have already been using multithreading in your Python code, although you might not have realized it.
(We are, as usual, assuming that you are running the latest version of Anaconda Python.)
This is because NumPy cleverly implements multithreading in a lot of its compiled code.
Let’s look at some examples to see this in action.
The next piece of code computes the eigenvalues of a large number of randomly generated matrices.
It takes a few seconds to run.
n = 20
m = 1000
for i in range(n):
X = np.random.randn(m, m)
λ = np.linalg.eigvals(X)
Now, let’s look at the output of the htop system monitor on our machine while this code is running:
Over the last few years, NumPy has managed to push this kind of multithreading out to more and more operations.
For example, let’s return to a maximization problem discussed previously:
456 ms ± 3.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you have a system monitor such as htop (Linux/Mac) or perfmon (Windows), then try running this and then observing
the load on your CPUs.
(You will probably need to bump up the grid size to see large effects.)
At least on our machine, the output shows that the operation is successfully distributed across multiple threads.
This is one of the reasons why the vectorized code above is fast.
To get some basis for comparison for the last example, let’s try the same thing with Numba.
In fact there is an easy way to do this, since Numba can also be used to create custom ufuncs with the @vectorize decorator.
@vectorize
def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
0.9999992797121728
333 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
At least on our machine, the difference in the speed between the Numba version and the vectorized NumPy version shown
above is not large.
But there’s quite a bit going on here so let’s try to break down what is happening.
Both Numba and NumPy use efficient machine code that’s specialized to these floating point operations.
However, the code NumPy uses is, in some ways, less efficient.
The reason is that, in NumPy, the operation np.cos(x**2 + y**2) / (1 + x**2 + y**2) generates several
intermediate arrays.
For example, a new array is created when x**2 is calculated.
The same is true when y**2 is calculated, and then x**2 + y**2 and so on.
Numba avoids creating all these intermediate arrays by compiling one function that is specialized to the entire operation.
But if this is true, then why isn’t the Numba code faster?
The reason is that NumPy makes up for its disadvantages with implicit multithreading, as we’ve just discussed.
0.9999992797121728
129 ms ± 433 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Now our code runs significantly faster than the NumPy version.
We just saw one approach to parallelization in Numba, using the parallel flag in @vectorize.
This is neat but, it turns out, not well suited to many problems we consider.
Fortunately, Numba provides another approach to multithreading that will work for us almost everywhere parallelization
is possible.
To illustrate, let’s look first at a simple, single-threaded (i.e., non-parallelized) piece of code.
The code simulates updating the wealth 𝑤𝑡 of a household via the rule
Here
• 𝑅 is the gross rate of return on assets
• 𝑠 is the savings rate of the household and
• 𝑦 is labor income.
We model both 𝑅 and 𝑦 as independent draws from a lognormal distribution.
Here’s the code:
@njit
(continues on next page)
# Draw shocks
R = np.exp(v1 * randn()) * (1 + r)
y = np.exp(v2 * randn())
# Update wealth
w = R * s * w + y
return w
fig, ax = plt.subplots()
T = 100
w = np.empty(T)
w[0] = 5
for t in range(T-1):
w[t+1] = h(w[t])
ax.plot(w)
ax.set_xlabel('$t$', fontsize=12)
ax.set_ylabel('$w_{t}$', fontsize=12)
plt.show()
Now let’s suppose that we have a large population of households and we want to know what median wealth will be.
This is not easy to solve with pencil and paper, so we will use simulation instead.
In particular, we will simulate a large number of households and then calculate median wealth for this group.
Suppose we are interested in the long-run average of this median over time.
It turns out that, for the specification that we’ve chosen above, we can calculate this by taking a one-period snapshot of
what has happened to median wealth of the group at the end of a long simulation.
Moreover, provided the simulation period is long enough, initial conditions don’t matter.
• This is due to something called ergodicity, which we will discuss later on.
So, in summary, we are going to simulate 50,000 households by
1. arbitrarily setting initial wealth to 1 and
2. simulating forward in time for 1,000 periods.
Then we’ll calculate median wealth at the end period.
Here’s the code:
@njit
def compute_long_run_median(w0=1, T=1000, num_reps=50_000):
obs = np.empty(num_reps)
for i in range(num_reps):
w = w0
for t in range(T):
w = h(w)
obs[i] = w
return np.median(obs)
%%time
compute_long_run_median()
1.8330713828632823
@njit(parallel=True)
def compute_long_run_median_parallel(w0=1, T=1000, num_reps=50_000):
obs = np.empty(num_reps)
for i in prange(num_reps):
w = w0
(continues on next page)
return np.median(obs)
%%time
compute_long_run_median_parallel()
1.8358907978263252
18.4.1 A Warning
Parallelization works well in the outer loop of the last example because the individual tasks inside the loop are independent
of each other.
If this independence fails then parallelization is often problematic.
For example, each step inside the inner loop depends on the last step, so independence fails, and this is why we use
ordinary range instead of prange.
When you see us using prange in later lectures, it is because the independence of tasks holds true.
When you see us using ordinary range in a jitted function, it is either because the speed gain from parallelization is
small or because independence fails.
18.5 Exercises
Exercise 18.5.1
In an earlier exercise, we used Numba to accelerate an effort to compute the constant 𝜋 by Monte Carlo.
Now try adding parallelization and see if you get further speed gains.
You should not expect huge gains here because, while there are many independent tasks (draw point and test if in circle),
each one has low execution time.
Generally speaking, parallelization is less effective when the individual tasks to be parallelized are very small relative to
total execution time.
This is due to overheads associated with spreading all of these small tasks across multiple CPUs.
Nevertheless, with suitable hardware, it is possible to get nontrivial speed gains in this exercise.
For the size of the Monte Carlo simulation, use something substantial, such as n = 100_000_000.
@njit(parallel=True)
def calculate_pi(n=1_000_000):
count = 0
for i in prange(n):
u, v = uniform(0, 1), uniform(0, 1)
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
return area_estimate * 4 # dividing by radius**2
%time calculate_pi()
CPU times: user 436 ms, sys: 7.74 ms, total: 444 ms
Wall time: 431 ms
3.13938
%time calculate_pi()
3.143884
By switching parallelization on and off (selecting True or False in the @njit annotation), we can test the speed gain
that multithreading provides on top of JIT compilation.
On our workstation, we find that parallelization increases execution speed by a factor of 2 or 3.
(If you are executing locally, you will get different numbers, depending mainly on the number of CPUs on your machine.)
Exercise 18.5.2
In our lecture on SciPy, we discussed pricing a call option in a setting where the underlying stock price had a simple and
well-known distribution.
Here we discuss a more realistic setting.
We recall that the price of the option obeys
𝑃 = 𝛽 𝑛 𝔼 max{𝑆𝑛 − 𝐾, 0}
where
1. 𝛽 is a discount factor,
2. 𝑛 is the expiry date,
3. 𝐾 is the strike price and
4. {𝑆𝑡 } is the price of the underlying asset at each time 𝑡.
Suppose that n, β, K = 20, 0.99, 100.
Assume that the stock price obeys
𝑆𝑡+1
ln = 𝜇 + 𝜎𝑡 𝜉𝑡+1
𝑆𝑡
where
̂ ∶= 𝛽 𝑛 𝔼 max{𝑆𝑛 − 𝐾, 0} ≈ 1 𝑀
𝑃𝑀 ∑ max{𝑆𝑛𝑚 − 𝐾, 0}
𝑀 𝑚=1
@njit(parallel=True)
def compute_call_price_parallel(β=β,
μ=μ,
S0=S0,
h0=h0,
K=K,
n=n,
ρ=ρ,
ν=ν,
M=M):
current_sum = 0.0
(continues on next page)
Try swapping between parallel=True and parallel=False and noting the run time.
If you are on a machine with many CPUs, the difference should be significant.
NINETEEN
JAX
New website
We have replaced this lecture with a new lecture series on quantitative economics using JAX:
See Quantitative Economics with JAX
321
Python Programming for Economics and Finance
323
CHAPTER
TWENTY
“Any fool can write code that a computer can understand. Good programmers write code that humans can
understand.” – Martin Fowler
20.1 Overview
When computer programs are small, poorly written code is not overly costly.
But more data, more sophisticated models, and more computer power are enabling us to take on more challenging problems
that involve writing longer programs.
For such programs, investment in good coding practices will pay high returns.
The main payoffs are higher productivity and faster code.
In this lecture, we review some elements of good coding practice.
We also touch on modern developments in scientific computing — such as just in time compilation — and how they affect
good program design.
Here
• 𝑘𝑡 is capital at time 𝑡 and
• 𝑠, 𝛼, 𝛿 are parameters (savings, a productivity parameter and depreciation)
For each parameterization, the code
1. sets 𝑘0 = 1
2. iterates using (20.1) to produce a sequence 𝑘0 , 𝑘1 , 𝑘2 … , 𝑘𝑇
3. plots the sequence
The plots will be grouped into three subfigures.
In each subfigure, two parameters are held fixed while another varies
325
Python Programming for Economics and Finance
import numpy as np
import matplotlib.pyplot as plt
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α[j] + (1 - δ) * k[t]
axes[0].plot(k, 'o-', label=rf"$\alpha = {α[j]},\; s = {s},\; \delta={δ}$")
axes[0].grid(lw=0.2)
axes[0].set_ylim(0, 18)
axes[0].set_xlabel('time')
axes[0].set_ylabel('capital')
axes[0].legend(loc='upper left', frameon=True)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s[j] * k[t]**α + (1 - δ) * k[t]
axes[1].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s[j]},\; \delta={δ}$")
axes[1].grid(lw=0.2)
axes[1].set_xlabel('time')
axes[1].set_ylabel('capital')
axes[1].set_ylim(0, 18)
axes[1].legend(loc='upper left', frameon=True)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α + (1 - δ[j]) * k[t]
axes[2].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ[j]}$")
axes[2].set_ylim(0, 18)
axes[2].set_xlabel('time')
axes[2].set_ylabel('capital')
axes[2].grid(lw=0.2)
(continues on next page)
plt.show()
There are usually many different ways to write a program that accomplishes a given task.
For small programs, like the one above, the way you write code doesn’t matter too much.
But if you are ambitious and want to produce useful things, you’ll write medium to large programs too.
In those settings, coding style matters a great deal.
Fortunately, lots of smart people have thought about the best way to write code.
Here are some basic precepts.
If you look at the code above, you’ll see numbers like 50 and 49 and 3 scattered through the code.
These kinds of numeric literals in the body of your code are sometimes called “magic numbers”.
This is not a compliment.
While numeric literals are not all evil, the numbers shown in the program above should certainly be replaced by named
constants.
For example, the code above could declare the variable time_series_length = 50.
Then in the loops, 49 should be replaced by time_series_length - 1.
The advantages are:
• the meaning is much clearer throughout
• to alter the time series length, you only need to change one value
Sure, global variables (i.e., names assigned to values outside of any function or class) are convenient.
Rookie programmers typically use global variables with abandon — as we once did ourselves.
But global variables are dangerous, especially in medium to large size programs, since
• they can affect what happens in any part of your program
• they can be changed by any function
This makes it much harder to be certain about what some small part of a given piece of code actually commands.
Here’s a useful discussion on the topic.
While the odd global in small scripts is no big deal, we recommend that you teach yourself to avoid them.
(We’ll discuss how just below).
JIT Compilation
For scientific computing, there is another good reason to avoid global variables.
As we’ve seen in previous lectures, JIT compilation can generate excellent performance for scripting languages like Python.
But the task of the compiler used for JIT compilation becomes harder when global variables are present.
Put differently, the type inference required for JIT compilation is safer and more effective when variables are sandboxed
inside a function.
Fortunately, we can easily avoid the evils of global variables and WET code.
• WET stands for “we enjoy typing” and is the opposite of DRY.
We can do this by making frequent use of functions or classes.
In fact, functions and classes are designed specifically to help us avoid shaming ourselves by repeating code or excessive
use of global variables.
Both can be useful, and in fact they work well with each other.
We’ll learn more about these topics over time.
(Personal preference is part of the story too)
What’s really important is that you use one or the other or both.
Here’s some code that reproduces the plot above with better coding style.
ax.set_xlabel('time')
ax.set_ylabel('capital')
ax.set_ylim(0, 18)
ax.legend(loc='upper left', frameon=True)
plt.show()
20.5 Exercises
Exercise 20.5.1
Here is some code that needs improving.
It involves a basic supply and demand problem.
Supply is given by
𝑞𝑠 (𝑝) = exp(𝛼𝑝) − 𝛽.
𝑞𝑑 (𝑝) = 𝛾𝑝−𝛿 .
This yields the equilibrium price 𝑝∗ . From this we get the equilibrium quantity by 𝑞 ∗ = 𝑞𝑠 (𝑝∗ )
The parameter values will be
• 𝛼 = 0.1
• 𝛽=1
• 𝛾=1
• 𝛿=1
# Compute equilibrium
def h(p):
return p**(-1) - (np.exp(0.1 * p) - 1) # demand - supply
p_star = brentq(h, 2, 4)
q_star = np.exp(0.1 * p_star) - 1
# Now plot
grid = np.linspace(2, 4, 100)
fig, ax = plt.subplots()
qs = np.exp(0.1 * grid) - 1
qd = grid**(-1)
ax.set_xlabel('price')
ax.set_ylabel('quantity')
ax.legend(loc='upper center')
plt.show()
# Compute equilibrium
def h(p):
(continues on next page)
p_star = brentq(h, 2, 4)
q_star = np.exp(0.1 * p_star) - 1
# Now plot
p_grid = np.linspace(2, 4, 100)
fig, ax = plt.subplots()
qs = np.exp(0.1 * p_grid) - 1
qd = 1.25 * p_grid**(-1)
ax.set_xlabel('price')
ax.set_ylabel('quantity')
ax.legend(loc='upper center')
plt.show()
Now we might consider supply shifts, but you already get the idea that there’s a lot of repeated code here.
Refactor and improve clarity in the code above using the principles discussed in this lecture.
class Equilibrium:
def compute_equilibrium(self):
def h(p):
return self.qd(p) - self.qs(p)
p_star = brentq(h, 2, 4)
q_star = np.exp(self.α * p_star) - self.β
def plot_equilibrium(self):
# Now plot
grid = np.linspace(2, 4, 100)
fig, ax = plt.subplots()
ax.set_xlabel('price')
ax.set_ylabel('quantity')
ax.legend(loc='upper center')
plt.show()
eq = Equilibrium()
eq.compute_equilibrium()
eq.plot_equilibrium()
One of the nice things about our refactored code is that, when we change parameters, we don’t need to repeat ourselves:
eq.γ = 1.25
eq.compute_equilibrium()
eq.plot_equilibrium()
TWENTYONE
21.1 Overview
With this last lecture, our advice is to skip it on first pass, unless you have a burning desire to read it.
It’s here
1. as a reference, so we can link back to it when required, and
2. for those who have worked through a number of applications, and now want to learn more about the Python language
A variety of topics are treated in the lecture, including iterators, decorators and descriptors, and generators.
21.2.1 Iterators
%%file us_cities.txt
new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229
341
Python Programming for Economics and Finance
Writing us_cities.txt
f = open('us_cities.txt')
f.__next__()
f.__next__()
We see that file objects do indeed have a __next__ method, and that calling this method returns the next line in the file.
The next method can also be accessed via the builtin function next(), which directly calls this method
next(f)
'chicago: 2707120\n'
e = enumerate(['foo', 'bar'])
next(e)
(0, 'foo')
next(e)
(1, 'bar')
%%file test_table.csv
Date,Open,High,Low,Close,Volume,Adj Close
2009-05-21,9280.35,9286.35,9189.92,9264.15,133200,9264.15
2009-05-20,9372.72,9399.40,9311.61,9344.64,143200,9344.64
2009-05-19,9172.56,9326.75,9166.97,9290.29,167000,9290.29
2009-05-18,9167.05,9167.82,8997.74,9038.69,147800,9038.69
2009-05-15,9150.21,9272.08,9140.90,9265.02,172000,9265.02
2009-05-14,9212.30,9223.77,9052.41,9093.73,169400,9093.73
2009-05-13,9305.79,9379.47,9278.89,9340.49,176000,9340.49
2009-05-12,9358.25,9389.61,9298.61,9298.61,188400,9298.61
2009-05-11,9460.72,9503.91,9342.75,9451.98,230800,9451.98
2009-05-08,9351.40,9464.43,9349.57,9432.83,220200,9432.83
Writing test_table.csv
f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)
next(nikkei_data)
All iterators can be placed to the right of the in keyword in for loop statements.
In fact this is how the for loop works: If we write
for x in iterator:
<code block>
f = open('somefile.txt', 'r')
for line in f:
# do something
21.2.3 Iterables
You already know that we can put a Python list to the right of in in a for loop
spam
eggs
x = ['foo', 'bar']
type(x)
list
next(x)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[12], line 1
----> 1 next(x)
x = ['foo', 'bar']
type(x)
list
y = iter(x)
type(y)
list_iterator
next(y)
'foo'
next(y)
'bar'
next(y)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
Cell In[17], line 1
(continues on next page)
StopIteration:
iter(42)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[18], line 1
----> 1 iter(42)
Some built-in functions that act on sequences also work with iterables
• max(), min(), sum(), all(), any()
For example
x = [10, -10]
max(x)
10
y = iter(x)
type(y)
list_iterator
max(y)
10
One thing to remember about iterators is that they are depleted by use
x = [10, -10]
y = iter(x)
max(y)
10
max(y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[23], line 1
----> 1 max(y)
* and ** are convenient and widely used tools to unpack lists and tuples and to allow users to define functions that take
arbitrarily many arguments as input.
In this section, we will explore how to use them and distinguish their use cases.
When we operate on a list of parameters, we often need to extract the content of the list as individual arguments instead
of a collection when passing them into functions.
Luckily, the * operator can help us to unpack lists and tuples into positional arguments in function calls.
To make things concrete, consider the following examples:
Without *, the print function prints a list
print(l1)
While the print function prints individual elements since * unpacks the list into individual arguments
print(*l1)
a b c
Unpacking the list using * into positional arguments is equivalent to defining them individually when calling the function
a b c
l1.append('d')
print(*l1)
a b c d
import numpy as np
import matplotlib.pyplot as plt
# Use * to unpack tuple βs and the tuple output from the generate_data␣
↪ function
# Use ** to unpack the dictionary of keyword arguments for lines
ax[idx].plot(*generate_data(*βs), **line_kargs)
In this example, * unpacked the zipped parameters βs and the output of generate_data function stored in tuples,
while ** unpacked graphical parameters stored in legend_kargs and line_kargs.
To summarize, when *list/*tuple and **dictionary are passed into function calls, they are unpacked into
individual arguments instead of a collection.
The difference is that * will unpack lists and tuples into positional arguments, while ** will unpack dictionaries into
keyword arguments.
When we define functions, it is sometimes desirable to allow users to put as many arguments as they want into a function.
You might have noticed that the ax.plot() function could handle arbitrarily many arguments.
If we look at the documentation of the function, we can see the function is defined as
def arb(*ls):
print(ls)
arb(l1, l2)
The inputs are passed into the function and stored in a tuple.
Let’s try more inputs
Similarly, Python allows us to use **kargs to pass arbitrarily many keyword arguments into functions
def arb(**ls):
print(ls)
{'l1': ['a', 'b', 'c'], 'l2': ['b', 'c', 'd'], 'l3': ['z', 'x', 'b']}
Overall, *args and **kargs are used when defining a function; they enable the function to take input with an arbitrary
size.
The difference is that functions with *args will be able to take positional arguments with an arbitrary size, while
**kargs will allow functions to take arbitrarily many keyword arguments.
Let’s look at some special syntax elements that are routinely used by Python developers.
You might not need the following concepts immediately, but you will see them in other people’s code.
Hence you need to understand them at some stage of your Python education.
21.4.1 Decorators
Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popular.
It’s very easy to say what decorators do.
On the other hand it takes a bit of effort to explain why you might use them.
An Example
import numpy as np
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
Now suppose there’s a problem: occasionally negative numbers get fed to f and g in the calculations that follow.
If you try it, you’ll see that when these functions are called with negative numbers they return a NumPy object called nan
.
This stands for “not a number” (and indicates that you are trying to evaluate a mathematical function at a point where it
is not defined).
Perhaps this isn’t what we want, because it causes other problems that are hard to pick up later on.
Suppose that instead we want the program to terminate whenever this happens, with a sensible error message.
This change is easy enough to implement
import numpy as np
def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))
Notice however that there is some repetition here, in the form of two identical lines of code.
Repetition makes our code longer and harder to maintain, and hence is something we try hard to avoid.
Here it’s not a big deal, but imagine now that instead of just f and g, we have 20 such functions that we need to modify
in exactly the same way.
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20 times.
The situation is still worse if the test logic is longer and more complicated.
In this kind of scenario the following approach would be neater
import numpy as np
def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g
Enter Decorators
def f(x):
return np.log(np.log(x))
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
with
@check_nonneg
def f(x):
return np.log(np.log(x))
@check_nonneg
def g(x):
return np.sqrt(42 * x)
21.4.2 Descriptors
class Car:
One potential problem we might have here is that a user alters one of these variables but not the other
car = Car()
car.miles
1000
car.kms
1610.0
car.miles = 6000
car.kms
1610.0
In the last two lines we see that miles and kms are out of sync.
What we really want is some mechanism whereby each time a user sets one of these variables, the other is automatically
updated.
A Solution
class Car:
def get_miles(self):
return self._miles
def get_kms(self):
return self._kms
car = Car()
car.miles
1000
car.miles = 6000
car.kms
9660.0
How it Works
The names _miles and _kms are arbitrary names we are using to store the values of the variables.
The objects miles and kms are properties, a common kind of descriptor.
The methods get_miles, set_miles, get_kms and set_kms define what happens when you get (i.e. access) or
set (bind) these variables
• So-called “getter” and “setter” methods.
The builtin Python function property takes getter and setter methods and creates a property.
For example, after car is created as an instance of Car, the object car.miles is a property.
Being a property, when we set its value via car.miles = 6000 its setter method is triggered — in this case
set_miles.
These days its very common to see the property function used via a decorator.
Here’s another version of our Car class that works as before but now uses decorators to set up the properties
class Car:
@property
def miles(self):
return self._miles
@property
def kms(self):
return self._kms
@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61
@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61
21.5 Generators
tuple
type(plural)
list
generator
next(plural)
'dogs'
next(plural)
'cats'
next(plural)
'birds'
285
The function sum() calls next() to get the items, adds successive terms.
In fact, we can omit the outer brackets in this case
285
The most flexible way to create generator objects is to use generator functions.
Let’s look at some examples.
Example 1
def f():
yield 'start'
yield 'middle'
yield 'end'
It looks like a function, but uses a keyword yield that we haven’t met before.
Let’s see how it works after running this code
type(f)
function
gen = f()
gen
next(gen)
'start'
next(gen)
'middle'
next(gen)
'end'
next(gen)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
Cell In[62], line 1
----> 1 next(gen)
StopIteration:
The generator function f() is used to create generator objects (in this case gen).
Generators are iterators, because they support a next method.
The first call to next(gen)
• Executes code in the body of f() until it meets a yield statement.
def f():
yield 'start'
yield 'middle' # This line!
yield 'end'
Example 2
def g(x):
while x < 100:
yield x
x = x * x
<function __main__.g(x)>
gen = g(2)
type(gen)
generator
next(gen)
next(gen)
next(gen)
16
next(gen)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
Cell In[70], line 1
----> 1 next(gen)
StopIteration:
def g(x):
while x < 100:
yield x
x = x * x # execution continues from here
def g(x):
while 1:
yield x
x = x * x
import random
n = 10000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
sum(draws)
4999971
But we are creating two huge lists here, range(n) and draws.
This uses lots of memory and is very slow.
If we make n even bigger then this happens
n = 100000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
def f(n):
i = 1
while i <= n:
yield random.uniform(0, 1) < 0.5
i += 1
n = 10000000
draws = f(n)
draws
sum(draws)
4998483
In summary, iterables
• avoid the need to create big lists/tuples, and
• provide a uniform interface to iteration that can be used transparently in for loops
21.6 Exercises
Exercise 21.6.1
Complete the following code, and test it using this csv file, which we assume that you’ve put in your current working
directory
dates = column_iterator('test_table.csv', 1)
dates = column_iterator('test_table.csv', 1)
i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1
Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11
TWENTYTWO
“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly
as possible, you are, by definition, not smart enough to debug it.” – Brian Kernighan
22.1 Overview
Are you one of those programmers who fills their code with print statements when trying to debug their programs?
Hey, we all used to do that.
(OK, sometimes we still do that…)
But once you start writing larger programs you’ll need a better system.
You may also want to handle potential errors in your code as they occur.
In this lecture, we will discuss how to debug our programs and improve error handling.
22.2 Debugging
Debugging tools for Python vary across platforms, IDEs and editors.
For example, a visual debugger is available in JupyterLab.
Here we’ll focus on Jupyter Notebook and leave you to explore other settings.
We’ll need the following imports
import numpy as np
import matplotlib.pyplot as plt
def plot_log():
fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
(continues on next page)
363
Python Programming for Economics and Finance
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[2], line 7
4 ax.plot(x, np.log(x))
5 plt.show()
----> 7 plot_log()
This code is intended to plot the log function over the interval [1, 2].
But there’s an error here: plt.subplots(2, 1) should be just plt.subplots().
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suitable for having two subplots
on the same figure)
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x)).
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array has no plot method.
But let’s pretend that we don’t understand this for the moment.
We might suspect there’s something wrong with ax but when we try to investigate this object, we get the following
exception:
ax
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 ax
The problem is that ax was defined inside plot_log(), and the name is lost once that function terminates.
Let’s try doing it a different way.
We run the first cell block again, generating the same error
def plot_log():
fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 7
4 ax.plot(x, np.log(x))
5 plt.show()
----> 7 plot_log()
%debug
You should be dropped into a new prompt that looks something like this
ipdb>
ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)
It’s now very clear that ax is an array, which clarifies the source of the problem.
To find out what else you can do from inside ipdb (or pdb), use the online help
ipdb> h
Undocumented commands:
======================
retval rv
ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
def plot_log():
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Here the original problem is fixed, but we’ve accidentally written np.logspace(1, 2, 10) instead of np.
linspace(1, 2, 10).
Now there won’t be any exception, but the plot won’t look right.
To investigate, it would be helpful if we could inspect variables like x during execution of the function.
To this end, we add a “break point” by inserting breakpoint() inside the function code block
def plot_log():
breakpoint()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Now let’s run the script, and investigate via the debugger
> <ipython-input-6-a188074383b7>(6)plot_log()
-> fig, ax = plt.subplots()
(Pdb) n
> <ipython-input-6-a188074383b7>(7)plot_log()
-> x = np.logspace(1, 2, 10)
(Pdb) n
> <ipython-input-6-a188074383b7>(8)plot_log()
-> ax.plot(x, np.log(x))
(Pdb) x
array([ 10. , 12.91549665, 16.68100537, 21.5443469 ,
(continues on next page)
We used n twice to step forward through the code (one line at a time).
Then we printed the value of x to see what was happening with that variable.
To exit from the debugger, use q.
Sometimes it’s possible to anticipate bugs and errors as we’re writing code.
For example, the unbiased sample variance of sample 𝑦1 , … , 𝑦𝑛 is defined as
𝑛
1
𝑠2 ∶= ∑(𝑦𝑖 − 𝑦)̄ 2 𝑦 ̄ = sample mean
𝑛 − 1 𝑖=1
def f:
Since illegal syntax cannot be executed, a syntax error terminates execution of the program.
Here’s a different kind of error, unrelated to syntax
1 / 0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[7], line 1
----> 1 1 / 0
Here’s another
x1 = y1
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 x1 = y1
And another
'foo' + 6
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 'foo' + 6
And another
X = []
x = X[0]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[10], line 2
1 X = []
----> 2 x = X[0]
22.3.2 Assertions
Sometimes errors can be avoided by checking whether your program runs as expected.
A relatively easy way to handle checks is with the assert keyword.
For example, pretend for a moment that the np.var function doesn’t exist and we need to write our own
def var(y):
n = len(y)
assert n > 1, 'Sample size must be greater than one.'
return np.sum((y - y.mean())**2) / float(n-1)
If we run this with an array of length one, the program will terminate and print our error message
var([1])
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[12], line 1
----> 1 var([1])
The approach used above is a bit limited, because it always leads to termination.
Sometimes we can handle errors more gracefully, by treating special cases.
Let’s look at how this is done.
Catching Exceptions
We can catch and deal with exceptions using try – except blocks.
Here’s a simple example
def f(x):
try:
return 1.0 / x
except ZeroDivisionError:
print('Error: division by zero. Returned None')
return None
f(2)
0.5
f(0)
f(0.0)
def f(x):
try:
return 1.0 / x
except ZeroDivisionError:
print('Error: Division by zero. Returned None')
except TypeError:
print(f'Error: x cannot be of type {type(x)}. Returned None')
return None
f(2)
0.5
f(0)
f('foo')
def f(x):
try:
return 1.0 / x
except:
print(f'Error. An issue has occurred with x = {x} of type: {type(x)}')
return None
f(2)
0.5
f(0)
f('foo')
22.4 Exercises
Exercise 22.4.1
Suppose we have a text file numbers.txt containing the following lines
prices
3
8
7
21
Using try – except, write a program to read in the contents of the file and sum the numbers, ignoring lines without
numbers.
You can use the open() function we learnt before to open numbers.txt.
%%file numbers.txt
prices
3
8
7
21
Writing numbers.txt
f = open('numbers.txt')
total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass
f.close()
print(total)
39.0
Other
375
CHAPTER
TWENTYTHREE
TROUBLESHOOTING
This page is for readers experiencing errors when running the code from the lectures.
The basic assumption of the lectures is that code in a lecture should execute whenever
1. it is executed in a Jupyter notebook and
2. the notebook is running on a machine with the latest version of Anaconda Python.
You have installed Anaconda, haven’t you, following the instructions in this lecture?
Assuming that you have, the most common source of problems for our readers is that their Anaconda distribution is not
up to date.
Here’s a useful article on how to update Anaconda.
Another option is to simply remove Anaconda and reinstall.
You also need to keep the external code libraries, such as QuantEcon.py up to date.
For this task you can either
• use conda upgrade quantecon on the command line, or
• execute !conda upgrade quantecon within a Jupyter notebook.
If your local environment is still not working you can do two things.
First, you can use a remote machine instead, by clicking on the Launch Notebook icon available for each lecture
Second, you can report an issue, so we can try to fix your local set up.
We like getting feedback on the lectures so please don’t hesitate to get in touch.
377
Python Programming for Economics and Finance
One way to give feedback is to raise an issue through our issue tracker.
Please be as specific as possible. Tell us where the problem is and as much detail about your local set up as you can
provide.
Another feedback option is to use our discourse forum.
Finally, you can provide direct feedback to contact@quantecon.org
TWENTYFOUR
EXECUTION STATISTICS
!python --version
Python 3.12.7
379
Python Programming for Economics and Finance
!conda list
B N
Bisection, 210 NetworkX, 15
Newton-Raphson Method, 211
C NumPy, 161, 205
Compiling Functions, 296 Arithmetic Operations, 168
Arrays, 161
D Arrays (Creating), 163
Data Sources, 236 Arrays (Indexing), 165
Debugging, 363 Arrays (Methods), 167
Dynamic Typing, 153 Arrays (Shape and Dimension), 162
Broadcasting, 170
I Comparisons, 178
Matrix Multiplication, 169
Immutable, 113
Universal Functions, 156
Integration, 213
Vectorized Functions, 177
IPython, 19
J O
Object-Oriented Programming
Jupyter, 19
Classes, 119
Jupyter Notebook
Key Concepts, 118
Basics, 21
Methods, 123
Debugging, 28
Special Methods, 131
Help, 28
OOP II: Building Classes, 117
nbviewer, 28
Optimization, 212
Setup, 19
Multivariate, 213
Sharing, 28
Jupyter Notebooks, 19 P
JupyterLab, 34
Pandas, 219
DataFrames, 222
L Series, 220
Linear Algebra, 213 Pandas for Panel Data, 249
pandas_datareader, 239
M Python, 17
Matplotlib, 12, 187 Anaconda, 18
3D Plots, 196 Assertions, 371
Multiple Plots on One Axis, 193 common uses, 6
Simple API, 187 Comparison, 79
Subplots, 194 Conditions, 61
Models Content, 92
Code style, 325 Cython, 304
Mutable, 113 Data Types, 69
Decorators, 350, 352, 354
381
Python Programming for Economics and Finance
382 Index