Quantitative Economics With Python
Quantitative Economics With Python
CONTENTS
Programming in Python
1.1 About Python . . . . . . . . . . . . . . . . .
1.2 Setting up Your Python Environment . . . .
1.3 An Introductory Example . . . . . . . . . .
1.4 Python Essentials . . . . . . . . . . . . . . .
1.5 Object Oriented Programming . . . . . . . .
1.6 How it Works: Data, Variables and Names .
1.7 More Language Features . . . . . . . . . . .
1.8 NumPy . . . . . . . . . . . . . . . . . . . . .
1.9 SciPy . . . . . . . . . . . . . . . . . . . . . .
1.10 Matplotlib . . . . . . . . . . . . . . . . . . .
1.11 Pandas . . . . . . . . . . . . . . . . . . . . .
1.12 IPython Shell and Notebook . . . . . . . . .
1.13 The Need for Speed . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
16
34
48
63
72
88
104
117
126
137
152
160
Introductory Applications
2.1 Linear Algebra . . . . . . . . . . . . . . .
2.2 Finite Markov Chains . . . . . . . . . . .
2.3 Shortest Paths . . . . . . . . . . . . . . .
2.4 Schellings Segregation Model . . . . . .
2.5 LLN and CLT . . . . . . . . . . . . . . .
2.6 Linear State Space Models . . . . . . . .
2.7 A First Look at the Kalman Filter . . . .
2.8 Infinite Horizon Dynamic Programming
2.9 LQ Control Problems . . . . . . . . . . .
2.10 Rational Expectations Equilibrium . . .
2.11 Markov Perfect Equilibrium . . . . . . .
2.12 Markov Asset Pricing . . . . . . . . . . .
2.13 The Permanent Income Model . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
173
173
190
204
208
212
226
248
261
278
304
313
326
334
.
.
.
.
349
349
364
376
384
Advanced Applications
3.1 Continuous State Markov Chains
3.2 The Lucas Asset Pricing Model .
3.3 Modeling Career Choice . . . . .
3.4 On-the-Job Search . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
393
403
415
441
456
474
487
504
Solutions
527
529
529
529
529
529
529
530
530
530
530
530
References
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
533
CONTENTS
Note: You are currently viewing an automatically generated PDF version of our online lectures, which are located at
http://quant-econ.net
Please visit the website for more information on the aims and scope of the lectures and
the two language options (Julia or Python). This PDF is generated from a set of source
files that are orientated towards the website and to HTML output. As a result, the
presentation quality can be less consistent than the website.
CONTENTS
CHAPTER
ONE
PROGRAMMING IN PYTHON
This first part of the course provides a relatively fast-paced introduction to the Python programming language
Overview
Common Uses Python is a general purpose language used in almost all application domains
communications
web development
CGI and graphical user interfaces
games
multimedia, data processing, security, etc., etc., etc.
Used extensively by Internet service and high tech companies such as
Google
Dropbox
Reddit
YouTube
Over the last decade, Python has become one of the core languages of scientific computing
This section briefly showcases some examples of Python for scientific programming
All of these topics will be covered in detail later on
Click on any figure to expand it
10
Numerical programming Fundamental matrix and array processing capabilities are provided
by the excellent NumPy library
NumPy provides the basic array data type plus some simple processing operations
For example
In [1]: import numpy as np
In [3]: b = np.cos(a)
In [4]: c = np.ones(25)
# An array of 25 ones
In [5]: np.dot(c, c)
Out[5]: 25.0
The SciPy library is built on top of NumPy and provides additional functionality For example,
R2
lets calculate 2 (z)dz where is the standard normal density
In [5]: from scipy.stats import norm
In [6]: from scipy.integrate import quad
In [7]: phi = norm()
In [8]: value, error = quad(phi.pdf, -2, 2)
In [9]: value
Out[9]: 0.9544997361036417
11
In [12]: x + x + x + y
Out[12]: 3*x + y
12
13
solve polynomials
In [15]: from sympy import solve
In [16]: solve(x**2 + x + 2)
Out[16]: [-1/2 - sqrt(7)*I/2, -1/2 + sqrt(7)*I/2]
The beauty of importing this functionality into Python is that we are working within a fully
fledged programming language
Can easily create tables of derivatives, generate LaTeX output, add it to figures, etc., etc.
Statistics Pythons data manipulation and statistics libraries have improved rapidly over the last
few years
Pandas One of the most popular libraries for working with data is pandas
Pandas is fast, efficient, flexible and well designed
Heres a simple example
In [21]: import pandas as pd
In [22]: import scipy as sp
In [23]: data = sp.randn(5, 2)
price
weight
2010-12-28 0.007255 1.129998
2010-12-29 -0.120587 -1.374846
2010-12-30 1.089384 0.612785
14
0.102297
1.254644
In [27]: df.mean()
out[27]:
price
0.176616
weight
0.344975
15
Cloud Computing Running your Python code on massive servers in the cloud is becoming easier and easier
An excellent example is Wakari well discuss how to get started with Wakari in the next lecture
See also
Amazon Elastic Compute Cloud
The Google App Engine (Python, Java, PHP or Go)
Pythonanywhere
Sagemath Cloud
Parallel Processing Apart from the cloud computing options listed above, you might like to
consider
Parallel computing through IPython clusters
The Starcluster interface to Amazons EC2
GPU programming through Copperhead or Pycuda
Other Developments There are many other interesting developments with scientific programming in Python
Some representative examples include
16
IPython notebook Python in your browser with code cells, embedded images, etc.
Numba speed up scientific Python code
Blaze a generalization of NumPy
PyMC Bayesian statistical modeling in Python
PyTables manage large data sets
CVXPY convex optimization in Python
Learn More
Overview
17
Warning: The core Python package is easy to install, but is most likely not what you should
choose for these lectures. The reason is that these lectures require the entire scientific programming ecosystem, which the core installation doesnt provide. Please read the following
carefully.
By far the best approach for our purposes is to install one of the free Python distributions that
contains
1. the core Python language and
2. the most popular scientific libraries
There are several such distributions
Of them our experiences lead us to recommend Anaconda
What follows assumes that you adopt this recommendation
Installing Anaconda Installing Anaconda is straightforward: download the binary and follow
the instructions
Important: If you are asked during the installation process whether youd like to make Anaconda
your default Python installation, say yes
Otherwise you can accept all of the defaults
In the short term youll probably find the ride less bumpy if you use Python 2.x rather than Python
3.x
Warning: If you do choose to use the Python 3.x Anaconda installer, you may come across a
few occasional errors when using the QuantEcon package. This is because there are currently
a few packages that are not included in the Python 3.x installer that are used in the QuantEcon
package (such as statsmodels). The majority of QuantEcon will work, and over time this becomes less of an issue as packages are updated for Python 3.4. However, for the time being, the
default Python 2.x installer is the recommended package to install.
18
Keeping Anaconda up to Date The packages in Anaconda update regularly so its important to
keep your distribution up to date
As a practice run, please execute the following
1. Open up a terminal
If you dont know what a terminal is
For Mac users, see this guide
For Windows users, search for the cmd application or see this guide
Linux users you will already know what a terminal is
2. Type conda update anaconda
(If youve already installed Anaconda and it was a little while ago, please make sure you execute
this step)
Get a Modern Browser Now is probably a good time to either
update your browser or
install a free modern browser such as Chrome or Firefox
Once youve done that we can fire up IPython notebook and start having fun
IPython Notebook
http://127.0.0.1:8888/
19
20
The notebook displays an active cell, into which you can type Python commands
Notebook Basics
Note: The following assumes you have IPython notebook version 2.0 or above which you will
have if youve gone through the steps above
Notice that in the previous figure the cell is surrounded by a green border
This means that the cell is in edit mode
As a result, you can type in Python code and it will appear in the cell
When youre ready to execute these commands, hit Shift-Enter instead of the usual Enter
So far so good youve run your first Python program
The next thing to understand about the IPython notebook is that it uses a modal editing system
This means that the effect of typing at the keyboard depends on which mode you are in
The two modes are
T HOMAS S ARGENT AND J OHN S TACHURSKI
21
1. Edit mode
Indicated by a green border around one cell, as in the pictures above
Whatever you type appears as is in that cell
2. Command mode
The green border is replaced by a grey border
Key strokes are interpreted as commands for example, typing b adds a new cell
below the current one
Switching modes
To switch to command mode from edit mode, hit the Esc key
To switch to edit mode from command mode, hit Enter or click in a cell
The modal behavior of the IPython notebook is a little tricky at first but very efficient when you
get used to it
For more details on the mechanics of using the notebook, see here
A Test Program Lets run a test program
22
Dont worry about the details for now lets just run it and see what happens
The easiest way to run this code is to copy and paste into a cell in the notebook, like so
23
In-line Figures One nice thing about IPython notebooks is that figures can also be displayed
inside the page
To achieve this effect, use the matplotlib inline magic
Here weve done this by prepending %matplotlib inline to the cell and executing it again (click
to enlarge)
Working with the Notebook In this section well run you quickly through some more IPython
notebook essentials just enough so that we can press ahead with programming
Tab Completion One nice feature of IPython is tab completion
For example, in the previous program we executed the line import numpy as np
NumPy is a numerical library well work with in depth
Functions in NumPy can be accessed with np.<function_name> type syntax (assuming youve
executed import numpy as np)
One way to explore these functions is to type in a cell np.<start_of_word> and hit the tab key
24
For example, here we type np.ran and hit tab (click to enlarge)
IPython offers up the only two possible completions, random and rank
In this way, the Tab key helps remind you of whats available, and also saves you typing
On-Line Help To get help on np.rank, say, we can execute np.rank?
Documentation appears in a split window of the browser, like so
Clicking in the top right of the lower split closes the on-line help
Other Content In addition to executing code, the IPython notebook allows you to embed text,
equations, figures and even videos in the page
For example, here we enter a mixture of plain text and LaTeX instead of code
Next we Esc to enter command mode and then type m to indicate that we are writing Markdown,
a mark-up language similar to (but simpler than) LaTeX
(You can also use your mouse to select Markdown from the Code drop-down box just below the list
of menu items)
Now we Shift + Enter to produce this
25
26
27
Sharing Notebooks A notebook can easily be saved and shared between users
Notebook files are just text files structured in JSON and typically ending with .ipynb
For example, try downloading the notebook we just created by clicking here
Save it somewhere you can navigate to easily
Now you can import it from the dashboard (the first browser page that opens when you start
IPython notebook) and run the cells or edit as discussed above
nbviewer The IPython organization has a site for sharing notebooks called nbviewer
The notebooks you see there are static html representations of notebooks
However, each notebook can be downloaded as an ipynb file by clicking on the download icon at
the top right of its page
Once downloaded you can open it as a notebook, as we discussed just above
Additional Software
There are some other bits and pieces we need to know about before we can proceed with the
lectures
The QuantEcon Package Along with a number of collaborators, weve written some software
that is useful for solving economic problems
You can browse this code at our GitHub repository
The longer and more useful routines are in the folder quantecon
(Click on the folder named quantecon in the preceding link or just click here to browse the code)
In fact this set of files has been organized into a package
In Python, a package is a software library a collection of programs that have been bundled for
distribution
Installing QuantEcon You can install the package by copying and pasting the following into a
system terminal (terminal on Mac, cmd on Windows, etc.)
pip install quantecon
A full set of instructions on installing and keeping your code up to date can be found here
Other Files In addition to the QuantEcon package, the GitHub repository also contains example
programs, solutions to exercises and so on
You can download these files individually by navigating to the GitHub page for the individual file
You can then click the Raw button to get the plain text of the program
T HOMAS S ARGENT AND J OHN S TACHURSKI
28
For example, see here for the file and here for the plain text
However, what you probably want to do is get a copy of the entire repo (repository)
Obtaining the GitHub Repo The easiest way to do this is to download the zip file by clicking
the Download ZIP button on the main page
Make sure you remember where you unzip the directory, and make it somewhere you can find it
easily again
Note:
There is another way to get a copy of the repo, using a program called Git
Well investigate how to do this in Exercise 2
Well see how to run one of the example files from the repo in the next section
Working with Python Files How does one run and experiment with an existing Python file
using the notebook?
For short files you can just copy and paste them into a cell
For longer files, you can run the code in the file in two steps
1. navigate to the correct directory
2. type run followed by the file name
Navigating to a Directory IPython notebook has a notion of present working directory where
it looks for files that you refer to during an IPython session
If youre trying to run a file not in the pwd (present working directory), you typically want to
change the pwd to the location of the file
Heres how:
To check the pwd, type pwd in an IPython cell
To list files and directories in the pwd, type ls
To enter directory dir in the pwd, type cd dir
To enter a directory dir somewhere else in the directory tree, type cd /full/path/dir
On Windows it might look like this cd C:/Python27/Scripts/dir
On Linux it might look like this cd /home/user/scripts/dir
To go back one level, type cd ..
29
30
An Example As an exercise, lets try running the file white_noise_plot.py from the examples
folder of the repo
In our case, were working on a Linux machine, and the repo is in /tmp/test/quant-econ/
In this kind of work, you can typically just type the first letter or two of the directory name and
then use the tab key to expand
Load vs Run Its often convenient to be able to see your code before you run it
For this purpose we can replace run with load and then execute as usual, like so
Savings Files To save the contents of a cell as file foo.py, put %%file foo.py as the first line of
the cell and then Shift + Enter
Here %%file is an example of an IPython cell magic
Alternatives
The preceding discussion covers most of what you need to know to write and run Python code
T HOMAS S ARGENT AND J OHN S TACHURSKI
31
However, as you start to write longer programs, you might want to experiment with your workflow
There are many different options and we cover only a few
Text Editors One thing we find is that nothing beats the power and efficiency of a good text
editor for working with program text
A text editor is an application that is specifically designed to work with text files such as Python
programs
A good text editor will provide syntax highlighting and allow you to search and replace, indent
blocks of code, etc., etc.
One that we recommend is Sublime Text, a popular and highly regarded text editor with a relatively moderate learning curve
Sublime Text is not free, but it does have an unlimited trial period, so you can take your time and
see if you like it
There are many others, and a lot of them are free you can find out more by googling for Python
text editors
If you want a top quality free editor and dont mind a sharper learning curve, try Emacs
If you want an outstanding free text editor and dont mind a seemingly vertical learning curve
plus long days of pain and suffering while all your neural pathways are rewired, try Vim
Text Editors Plus IPython A good work flow for longer projects is to have an IPython notebook
session and a text editor open side by side
The text editor contains whatever programs are currently being edited
To run them we switch to IPython notebook and type run filename.py in a cell
You can also experiment with bits of code in notebook cells and copy and paste into the main
program via the text editor when youre satisfied
IPython Shell vs IPython Notebook If youre working with a text editor, you can also run
Python programs in an ordinary IPython shell rather than an IPython notebook
To use an IPython shell, open up a terminal and type ipython
You should see something like this
The IPython shell has many of the features of the notebook: tab completion, color syntax, etc.
It also has command history through the arrow key
Type any command, such as print foo
Now hit the up arrow key, and then return
32
Exercise 1 If IPython notebook is still running, quit by using Ctrl-C at the terminal where you
started it
Now try launching it again, but this time using ipython notebook --no-browser
This should start the kernel without launching the browser
Note also the startup message: It should give you a URL such as http://127.0.0.1:8888 where
the notebook is running
Now
T HOMAS S ARGENT AND J OHN S TACHURSKI
33
34
This looks complicated, but its just git clone in front of the URL for our main repository
Even better, sign up to GitHub its free
Look into forking GitHub repositories
(Loosely speaking, forking means making your own copy of a GitHub repository, stored on
GitHub)
Try forking the main repository for the course
Now try cloning it to some local directory, making edits, adding and committing them, and pushing them back up to your forked GitHub repo
For reading on these and other topics, try
The official Git documentation
Reading through the docs on GitHub
35
In this lecture we will write and then pick apart small Python programs
The objective is to introduce you to basic Python syntax and data structures
Deeper conceptshow things workwill be covered in later lectures
In reading the following, you should be conscious of the fact that all first programs are to some
extent contrived
We try to avoid this, but nonetheless
Be aware that the programs are written to illustrate certain concepts
By the time you finish the course, you will be writing the same programs in a rather
differentand more efficientway
In particular, the scientific libraries will allow us to accomplish the same things much faster and
more efficiently, once we know how to use them
However, you also need to learn pure Python, the core language
This is the objective of the present lecture, and the next few lectures too
Prerequisites: An understanding of how to run simple programs, as described here
First Example: Plotting a White Noise Process
To begin, lets suppose that we want to simulate and plot the white noise process e0 , e1 , . . . , eT ,
where each draw et is independent standard normal
In other words, we want to generate figures that look something like this:
36
A program that accomplishes what we want can be found in the file test_program_1.py from the
examples folder of the main repository
It reads as follows
1
2
3
4
5
6
7
8
9
In brief,
Lines 12 use the Python import keyword to pull in functionality from external libraries
Line 3 sets the desired length of the time series
Line 4 creates an empty list called epsilon_values that will store the et values as we generate
them
Line 5 tells the Python interpreter that it should cycle through the block of indented lines
(lines 67) ts_length times before continuing to line 8
Lines 67 draw a new value et and append it to the end of the list epsilon_values
Lines 89 generate the plot and display it to the user
T HOMAS S ARGENT AND J OHN S TACHURSKI
37
Lets now break this down and see how the different parts work
Import Statements First well look at how to import functionality from outside your program,
as in lines 12
Modules Consider the line from random import normalvariate
Here random is a module, which is just a file containing Python code
The statement from random import normalvariate causes the Python interpreter to
run the code in a file called random.py that was placed in your filesystem when you installed
Python
make the function normalvariate defined in that file available for use in your program
If you want to import more attributes you can use a comma separated list, like so:
In [4]: from random import normalvariate, uniform
In [5]: normalvariate(0, 1)
Out[5]: -0.38430990243287594
In [6]: uniform(-1, 1)
Out[6]: 0.5492316853602877
we
can
access
anything
defined
within
via
38
So import matplotlib.pyplot as plt runs the __init__.py file in the directory matplotlib/pyplot and makes
the attributes specified in that file available to us
The keyword as in import matplotlib.pyplot as plt just lets us access these attributes via as simpler
name
Lists Next lets consider the statement epsilon_values = [], which creates an empty list
Lists are a native Python data structure used to group a collection of objects. For example
In [7]: x = [10, 'foo', False]
In [8]: type(x)
Out[8]: list
Here the first element of x is an integer, the next is a string and the third is a Boolean value
When adding a value to a list, we can use the syntax list_name.append(some_value)
In [9]: x
Out[9]: [10, 'foo', False]
In [10]: x.append(2.5)
In [11]: x
Out[11]: [10, 'foo', False, 2.5]
Here append() is whats called a method, which is a function attached to an objectin this case,
the list x
Well learn all about methods later on, but just to give you some idea,
Python objects such as lists, strings, etc. all have methods that are used to manipulate the
data contained in the object
String objects have string methods, list objects have list methods, etc.
Another useful list method is pop()
In [12]: x
Out[12]: [10, 'foo', False, 2.5]
In [13]: x.pop()
Out[13]: 2.5
In [14]: x
Out[14]: [10, 'foo', False]
39
The For Loop Now lets consider the for loop in test_program_1.py, which we repeat here for
convenience, along with the line that follows it
for i in range(ts_length):
e = normalvariate(0, 1)
epsilon_values.append(e)
plt.plot(epsilon_values, 'b-')
The for loop causes Python to execute the two indented lines a total of ts_length times before
moving on
These two lines are called a code block, since they comprise the block of code that we are
looping over
Unlike most other languages, Python knows the extent of the code block only from indentation
In particular, the fact that indentation decreases after line epsilon_values.append(e) tells Python
that this line marks the lower limit of the code block
More on indentation belowfor now lets look at another example of a for loop
animals = ['dog', 'cat', 'bird']
for animal in animals:
print("The plural of " + animal + " is " + animal + "s")
If you put this in a text file and run it you will see
The plural of dog is dogs
The plural of cat is cats
The plural of bird is birds
This example helps to clarify how the for loop works: When we execute a loop of the form
for variable_name in sequence:
<code block>
40
For each element of sequence, it binds the name variable_name to that element and then
executes the code block
The sequence object can in fact be a very general object, as well see soon enough
Code Blocks and Indentation In discussing the for loop, we explained that the code blocks
being looped over are delimited by indentation
In fact, in Python all code blocks (i.e., those occuring inside loops, if clauses, function definitions,
etc.) are delimited by indentation
Thus, unlike most other languages, whitespace in Python code affects the output of the program
Once you get used to it, this is a very good thing because it
forces clean, consistent indentation, which improves readability
removes clutter, such as the brackets or end statements used in other languages
On the other hand, it takes a bit of care to get right, so please remember:
The line before the start of a code block always ends in a colon
for i in range(10):
if x > y:
while x < 100:
etc., etc.
All lines in a code block must have the same amount of indentation
The Python standard is 4 spaces, and thats what you should use
Tabs vs Spaces One small gotcha here is the mixing of tabs and spaces
(Important: Within text files, the internal representation of tabs and spaces is not the same)
You can use your Tab key to insert 4 spaces, but you need to make sure its configured to do so
If you are using the IPython notebook you will have no problems here
(Also, good quality text editors will allow you to configure the Tab key to insert spaces instead of
tabs trying searching on line)
While Loops The for loop is the most common technique for iteration in Python
But, for the purpose of illustration, lets modify test_program_1.py to use a while loop instead
In Python, the while loop syntax is as shown in the file test_program_2.py below
1
2
3
4
41
5
6
7
8
9
10
11
i = 0
while i < ts_length:
e = normalvariate(0, 1)
epsilon_values.append(e)
i = i + 1
plt.plot(epsilon_values, 'b-')
plt.show()
3
4
5
6
7
8
9
10
def generate_data(n):
epsilon_values = []
for i in range(n):
e = normalvariate(0, 1)
epsilon_values.append(e)
return epsilon_values
11
12
13
14
data = generate_data(100)
plt.plot(data, 'b-')
plt.show()
Lets go over this carefully, in case youre not familiar with functions and how they work
We have defined a function called generate_data(), where the definition spans lines 49
def on line 4 is a Python keyword used to start function definitions
def generate_data(n): indicates that the function is called generate_data, and that it has
a single argument n
42
Lines 59 are a code block called the function bodyin this case it creates an iid list of random
draws using the same logic as before
Line 9 indicates that the list epsilon_values is the object that should be returned to the
calling code
This whole function definition is read by the Python interpreter and stored in memory
When the interpreter gets to the expression generate_data(100) in line 12, it executes the function
body (lines 59) with n set equal to 100.
The net result is that the name data on the left-hand side of line 12 is set equal to the list
epsilon_values returned by the function
Conditions Our function generate_data() is rather limited
Lets make it slightly more useful by giving it the ability to return either standard normals or
uniform random variables on (0, 1) as required
This is achieved in test_program_4.py by adding the argument generator_type to
generate_data()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Comments:
Hopefully the syntax of the if/else clause is self-explanatory, with indentation again delimiting the extent of the code blocks
We are passing the argument U as a string, which is why we write it as U
Notice that equality is tested with the == syntax, not =
For example, the statement a = 10 assigns the name a to the value 10
The expression a == 10 evaluates to either True or False, depending on the value of a
Now, there are two ways that we can simplify test_program_4
First, Python accepts the following conditional assignment syntax
T HOMAS S ARGENT AND J OHN S TACHURSKI
43
3
4
5
6
7
8
9
10
11
12
13
14
Second, and more importantly, we can get rid of the conditionals all together by just passing the
desired generator type as a function
To understand this, consider test_program_6.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
The only lines that have changed here are lines 7 and 11
In line 11, when we call the function generate_data(), we pass uniform as the second argument
The object uniform is in fact a function, defined in the random module
In [23]: from random import uniform
In [24]: uniform(0, 1)
Out[24]: 0.2981045489306786
44
When the function call generate_data(100, uniform) on line 11 is executed, Python runs the
code block on lines 59 with n equal to 100 and the name generator_type bound to the function
uniform
While these lines are executed, the names generator_type and uniform are synonyms,
and can be used in identical ways
This principle works more generallyfor example, consider the following piece of code
In [25]: max(7, 2, 4)
Out[25]: 7
In [26]: m = max
In [27]: m(7, 2, 4)
Out[27]: 7
Here we created another name for the built-in function max(), which could then be used in identical ways
In the context of our program, the ability to bind new names to functions means that there is no
problem passing a function as an argument to another functionas we do in line 11
List Comprehensions Now is probably a good time to tell you that we can simplify the code for
generating the list of random draws considerably by using something called a list comprehension
List comprehensions are an elegant Python tool for creating lists
Consider the following example, where the list comprehension is on the right-hand side of the
second line
In [28]: animals = ['dog', 'cat', 'bird']
In [29]: plurals = [animal + 's' for animal in animals]
In [30]: plurals
Out[30]: ['dogs', 'cats', 'birds']
45
into
epsilon_values = [generator_type(0, 1) for i in range(n)]
Using the Scientific Libraries As discussed at the start of the lecture, our example is somewhat
contrived
In practice we would use the scientific libraries, which can generate large arrays of independent
random draws much more efficiently
For example, try
In [34]: from numpy.random import randn
In [35]: epsilon_values = randn(5)
In [36]: epsilon_values
Out[36]: array([-0.15591709, -1.42157676, -0.67383208, -0.45932047, -0.17041278])
46
If U1 , . . . , Un are iid copies of U, then, as n gets large, the fraction that fall in B converges to
the probability of landing in B
For a circle, area = pi * radius^2
Exercise 4 Write a program that prints one realization of the following random device:
Flip an unbiased coin 10 times
If 3 consecutive heads occur one or more times within this sequence, pay one dollar
If not, pay nothing
Use no import besides from random import uniform
Exercise 5 Your next task is to simulate and plot the correlated time series
x t +1 = x t + e t +1
where
x0 = 0
and
t = 0, . . . , T
47
Now, starting with your solution to exercise 5, plot three simulated time series, one for each of the
cases = 0, = 0.8 and = 0.98
In particular, you should produce (modulo randomness) a figure that looks as follows
(The figure nicely illustrates how time series with the same one-step-ahead conditional volatilities,
T HOMAS S ARGENT AND J OHN S TACHURSKI
48
as these three processes have, can have very different unconditional volatilities.)
In your solution, please restrict your import statements to
import matplotlib.pyplot as plt
from random import normalvariate
Solution notebook
Topics:
Data types
49
Imports
Basic file I/O
The Pythonic approach to iteration
More on user-defined functions
Comparisons and logic
Standard Python style
Data Types
So far weve briefly met several common data types, such as strings, integers, floats and lists
Lets learn a bit more about them
Primitive Data Types A particularly simple data type is Boolean values, which can be either
True or False
In [1]: x = True
In [2]: y = 100 < 10
In [3]: y
Out[3]: False
In [4]: type(y)
Out[4]: bool
In [9]: sum(bools)
Out[9]: 3
50
In [3]: type(a)
Out[3]: int
In [4]: type(c)
Out[4]: float
Computers distinguish between the two because, while floats are more informative, interal arithmetic operations on integers are more straightforward
Warning: Be careful: In Python 2.x, division of two integers returns only the integer part
To clarify:
In [5]: 1 / 2
Out[5]: 0
# Integer division
In [7]: 1.0 / 2
Out[7]: 0.5
There are several more primitive data types that well introduce as necessary
Containers Python has several basic types for storing collections of (possibly heterogeneous)
data
We have already discussed lists
A related data type is tuples, which are immutable lists
In [13]: x = ('a', 'b')
In [14]: x = 'a', 'b'
In [15]: x
Out[15]: ('a', 'b')
51
In [16]: type(x)
Out[16]: tuple
In Python, an object is called immutable if, once created, the object cannot be changed
Lists are mutable while tuples are not
In [17]: x = [1, 2]
In [18]: x[0] = 10
In [19]: x = (1, 2)
In [20]: x[0] = 10
# Trying to mutate them produces an error
--------------------------------------------------------------------------TypeError
Traceback (most recent call last)
<ipython-input-21-6cb4d74ca096> in <module>()
----> 1 x[0]=10
TypeError: 'tuple' object does not support item assignment
Well say more about mutable vs immutable a bit later, and explain why the distinction is important
Tuples (and lists) can be unpacked as follows
In [21]: integers = (10, 20, 30)
In [22]: x, y, z = integers
In [23]: x
Out[23]: 10
In [24]: y
Out[24]: 20
52
Sets and Dictionaries Two other container types we should mention before moving on are sets
and dictionaries
Dictionaries are much like lists, except that the items are named instead of numbered
In [25]: d = {'name': 'Frodo', 'age': 33}
In [26]: type(d)
Out[26]: dict
In [27]: d['age']
Out[27]: 33
53
From the start, Python has been designed around the twin principles of
a small core language
extra functionality in separate libraries or modules
For example, if you want to compute the square root of an arbitrary number, theres no built in
function that will perform this for you
Instead, you need to import the functionality from a module in this case a natural choice is math
In [1]: import math
In [2]: math.sqrt(4)
Out[2]: 2.0
Here from math import * pulls all of the functionality of math into the current namespace a
concept well define formally later on
Actually this kind of syntax should be avoided for the most part
In essence the reason is that it pulls in lots of variable names without explicitly listing them a
potential source of conflicts
Input and Output
In [36]: f.write('Testing\n')
54
Here
The built-in function open() creates a file object for writing to
Both write() and close() are methods of file objects
Where is this file that weve created?
Recall that Python maintains a concept of the current working directory (cwd) that can be located
by
import os
print os.getcwd()
Paths Note that if newfile.txt is not in the cwd then this call to open() fails
In this case you can either specify the full path to the file
In [43]: f = open('insert_full_path_to_file/newfile.txt', 'r')
or change the current working directory to the location of the file via os.chdir(path_to_file)
(In IPython, use cd to change directories)
Details are OS specific, by a Google search on paths and Python should yield plenty of examples
55
One of the most important tasks in computing is stepping through a sequence of data and performing a given action
One of Pythons strengths is its simple, flexible interface to this kind of iteration via the for loop
Looping over Different Objects Many Python objects are iterable, in the sense that they can
be placed to the right of in within a for loop statement
To give an example, suppose that we have a file called us_cities.txt listing US cities and their
population
new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229
Suppose that we want to make the information more readable, by capitalizing names and adding
commas to mark thousands
The following program reads the data in and makes the conversion
1
2
3
4
5
6
7
# Tuple unpacking
# Capitalize city names
# Add commas to numbers
Here format() is a powerful string method used for inserting variables into strings
The output is as follows
New York
Los Angeles
Chicago
Houston
Philadelphia
Phoenix
San Antonio
San Diego
Dallas
8,244,910
3,819,702
2,707,120
2,145,146
1,536,471
1,469,471
1,359,758
1,326,179
1,223,229
The reformatting of each line is the result of three different string methods, the details of which
can be left till later
The interesting part of this program for us is line 2, which shows that
T HOMAS S ARGENT AND J OHN S TACHURSKI
56
1. The file object f is iterable, in the sense that it can be placed to the right of in within a for
loop
2. Iteration steps through each line in the file
This leads to the clean, convenient syntax shown in our program
Many other kinds of objects are iterable, and well discuss some of them later on
Looping without Indices One thing you might have noticed is that Python tends to favor looping without explicit indexing
For example,
for x in x_values:
print x * x
is preferred to
for i in range(len(x_values)):
print x_values[i] * x_values[i]
When you compare these two alternatives, you can see why the first one is preferred
Python provides some facilities to simplify looping without indices
One is zip(), which is used for stepping through pairs from two sequences
For example, try running the following code
countries = ('Japan', 'Korea', 'China')
cities = ('Tokyo', 'Seoul', 'Beijing')
for country, city in zip(countries, cities):
print 'The capital of {0} is {1}'.format(country, city)
The zip() function is also useful for creating dictionaries for example
In [1]: names = ['Tom', 'John']
In [2]: marks = ['E', 'F']
In [3]: dict(zip(names, marks))
Out[3]: {'John': 'F', 'Tom': 'E'}
If we actually need the index from a list, one option is to use enumerate()
To understand what enumerate() does, consider the following example
letter_list = ['a', 'b', 'c']
for index, letter in enumerate(letter_list):
print "letter_list[{0}] = '{1}'".format(index, letter)
57
letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'
Comparisons Many different kinds of expressions evaluate to one of the Boolean values (i.e.,
True or False)
A common type is comparisons, such as
In [44]: x, y = 1, 2
In [45]: x < y
Out[45]: True
In [46]: x > y
Out[46]: False
# Assignment
In [50]: x == 2
Out[50]: False
# Comparison
Note that when testing conditions, we can use any valid Python expression
In [52]: x = 'yes' if 42 else 'no'
In [53]: x
Out[53]: 'yes'
In [54]: x = 'yes' if [] else 'no'
In [55]: x
Out[55]: 'no'
58
Expressions that evaluate to zero, empty sequences/containers (strings, lists, etc.) and None
are equivalent to False
All other values are equivalent to True
Combining Expressions We can combine expressions using and, or and not
These are the standard logical connectives (conjunction, disjunction and denial)
In [56]: 1 < 2 and 'f' in 'foo'
Out[56]: True
In [57]: 1 < 2 and 'g' in 'foo'
Out[57]: False
In [58]: 1 < 2 or 'g' in 'foo'
Out[58]: True
In [59]: not True
Out[59]: False
In [60]: not not True
Out[60]: True
Remember
P and Q is True if both are True, else False
P or Q is False if both are False, else True
More Functions
Lets talk a bit more about functions, which are all-important for good programming style
Python has a number of built-in functions that are available without import
We have already met some
In [61]: max(19, 20)
Out[61]: 20
In [62]: range(4)
Out[62]: [0, 1, 2, 3]
In [63]: str(22)
Out[63]: '22'
In [64]: type(22)
Out[64]: int
59
In [67]: any(bools)
Out[67]: True
Functions without a return statement automatically return the special Python object None
Docstrings Python has a system for adding comments to functions, modules, etc. called docstrings
The nice thing about docstrings is that they are available at run-time
T HOMAS S ARGENT AND J OHN S TACHURSKI
60
For example, lets say that this code resides in file temp.py
# Filename: temp.py
def f(x):
"""
This function squares its argument
"""
return x**2
After it has been run in the IPython shell, the docstring is available as follows
In [1]: run temp.py
In [2]: f?
Type:
function
String Form:<function f at 0x2223320>
File:
/home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument
In [3]: f??
Type:
function
String Form:<function f at 0x2223320>
File:
/home/john/temp/temp.py
Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2
With one question mark we bring up the docstring, and with two we get the source code as well
One-Line Functions: lambda The lambda keyword is used to create simple functions on one line
For example, the definitions
def f(x):
return x**3
and
f = lambda x: x**3
R2
0
The SciPy library has a function called quad that will do this calculation for us
The syntax of the quad function is quad(f, a, b) where f is a function and a and b are numbers
To create the function f ( x ) = x3 we can use lambda as follows
T HOMAS S ARGENT AND J OHN S TACHURSKI
61
Here the function created by lambda is said to be anonymous, because it was never given a name
Keyword Arguments If you did the exercises in the previous lecture, you would have come across
the statement
plt.plot(x, 'b-', label="white noise")
In this call to Matplotlibs plot function, notice that the last argument is passed in name=argument
syntax
This is called a keyword argument, with label being the keyword
Non-keyword arguments are called positional arguments, since their meaning is determined by
order
plot(x, b-, label="white noise") is different from plot(b-, x, label="white
noise")
Keyword arguments are particularly useful when a function has a lot of arguments, in which case
its hard to remember the right order
You can adopt keyword arguments in user defined functions with no difficulty
The next example illustrates the syntax
def f(x, coefficients=(1, 1)):
a, b = coefficients
return a + b * x
Notice that the keyword argument values we supplied in the definition of f become the default
values
Coding Style and PEP8
To learn more about the Python programming philosophy type import this at the prompt
Among other things, Python strongly favors consistency in programming style
Weve all heard the saying about consistency and little minds
In programming, as in mathematics, quite the opposite is true
T HOMAS S ARGENT AND J OHN S TACHURSKI
62
A mathematical paper where the symbols and were reversed would be very hard to
read, even if the author told you so on the first page
In Python, the style that all good programs follow is set out in PEP8
We recommend that you slowly learn it, and following it in your programs
Exercises
Exercise 1 Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute
their inner product using zip()
Part 2: In one line, count the number of even numbers in 0,...,99
Hint: x % 2 returns 0 if x is even, 1 otherwise
Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of pairs (a, b)
such that both a and b are even
Exercise 2 Consider the polynomial
p ( x ) = a0 + a1 x + a2 x 2 + a n x n =
ai x i
(1.1)
i =0
Write a function p such that p(x, coeff) that computes the value in (1.1) given a point x and a list
of coefficients coeff
Try to use enumerate() in your loop
Exercise 3 Write a function that takes a string as an argument and returns the number of capital
letters in the string
Hint: foo.upper() returns FOO
Exercise 4 Write a function that takes two sequences seq_a and seq_b as arguments and returns
True if every element in seq_a is also an element of seq_b, else False
By sequence we mean a list, a tuple or a string
Do the exercise without using sets and set methods
Exercise 5 When we cover the numerical libraries, we will see they include many alternatives
for interpolation and function approximation
Nevertheless, lets write our own function approximation routine as an exercise
In particular, without using any imports, write a function linapprox that takes as arguments
A function f mapping some interval [ a, b] into R
two scalars a and b providing the limits of this interval
T HOMAS S ARGENT AND J OHN S TACHURSKI
63
Solution notebook
Overview
OOP is one of the major paradigms in programming, and nicely supported in Python
OOP has become an important concept in modern software engineering because
It can help facilitate clean, efficient code (when used well)
The OOP design pattern fits well with the human brain
OOP is all about how to organize your code
This topic is important! Proper organization of code is a critical determinant of productivity
Moreover, OOP is a part of Python, and to progress further its necessary to understand the basics
About OOP
64
Fortran and MATLAB are mainly procedural, but with some OOP recently tacked on
C is a procedural language, while C++ is C with OOP added on top
Lets look at general concepts before we specialize to Python
Key Concepts The traditional (non-OOP) paradigm is called procedural, and works as follows
The program has a state that contains the values of its variables
Functions are called to act on these data according to the task
Data are passed back and forth via function calls
In contrast, in the OOP paradigm, data and functions are bundled together into objects
An example is a Python list, which not only stores data, but also knows how to sort itself, etc.
In [1]: x = [1, 5, 4]
In [2]: x.sort()
In [3]: x
Out[3]: [1, 4, 5]
65
x is an object or instance, created from the definition for Python lists, but with its own particular data
x.sort() and x.__class__ are two attributes of x
dir(x) can be used to view all the attributes of x
Another Example Lets look at an example of object-oriented design, this time from a third party
module
Python can be used to send, receive and organize email through low-level libraries that interact
with mail servers
The envelopes module by Tomek Wojcik provides a nice high-level interface to these kinds of
tasks
In the module, emails are represented as objects that
contain data (recipient list, subject, attachments, body, etc.)
possess methods that act on this and other data (add attachments, send the email, etc.)
Heres an example of usage provided by the developer
from envelopes import Envelope
envelope = Envelope(
from_addr=(u'from@example.com', u'From Example'),
to_addr=(u'to@example.com', u'To Example'),
subject=u'Envelopes demo',
text_body=u"I'm a helicopter!")
envelope.add_attachment('/Users/bilbo/Pictures/helicopter.jpg')
envelope.send('smtp.googlemail.com', login='from@example.com',
password='password', tls=True)
66
For an example more relevant to OOP, consider the open windows on your desktop
Windows have common functionality and individual data, which makes them suitable for implementing with OOP
individual data: contents of specific windows
common functionality: closing, maximizing, etc.
Your window manager almost certainly uses OOP to generate and manage these windows
individual windows created as objects / instances from a class definition, with their own
data
common functionality implemented as methods, which all of these objects share
Another, more prosaic, use of OOP is data encapsulation
Data encapsulation means storing variables inside some structure so that they are not directly
accessible
The alternative to this is filling the global namespace with variable names, frequently leading to
conflicts
Think of the global namespace as any name you can refer to without a dot in front of it
For example, the modules os and sys both define a different attribute called path
The following code leads immediately to a conflict
from os import path
from sys import path
At this point, both variables have been brought into the global namespace, and the second will
shadow the first
A better idea is to replace the above with
import os
import sys
and then reference the path you want with either os.path or sys.path
In this example, we see that modules provide one means of data encapsulation
As will now become clear, OOP provides another
Defining Your Own Classes
As a first step we are going to try defining very simple classes, the main purpose of which is data
encapsulation
Suppose that we are solving an economic model, and one small part of the model is a firm, characterized by
a production function f (k ) = k0.5
67
Here
The first two lines form the simplest class definition possible in Pythonin this case called
Firm
The third line creates an object called firm as an instance of class Firm
The last three lines dynamically add attributes to the object firm
Data and Methods The Firm example is only barely OOPin fact you can do the same kind of
thing with a MATLAB class or C struct
Usually classes also define methods that act on the data contained by the object
For example, the list method sort() in x.sort()
Lets try to build something a bit closer to this standard conception of OOP
Since the notation used to define classes seems complex on first pass, we will start with a very
simple (and rather contrived) example
In particular, lets build a class to represent dice
The data associated with a given dice will be the side facing up
The only method will be a method to roll the dice (and hence change the state)
The following is psuedocodea class definition in a mix of Python and plain English
class Dice:
data:
current_face -- the side facing up (i.e., number of dots showing)
methods:
roll -- roll the dice (i.e., change current_face)
68
Theres some difficult notation here, but the broad picture is as follows:
The faces variable is a class attribute
it will be shared by every member of the class (i.e., every die)
The current_face variable is an instance attribute
each die that we create will have its own version
The __init__ denotes a special method called a constructor
Used to create instances (objects) from the class definition, with their own data
The roll method rolls the die, changing the state of a particular instance
Once weve run the program, the class definition is loaded into memory
In [7]: Dice
Out[7]: __main__.Dice
In [8]: dir(Dice)
Out[8]: ['__doc__', '__init__', '__module__', 'faces', 'roll']
In [9]: Dice.faces
Out[9]: (1, 2, 3, 4, 5, 6)
These two statements implicitly call the __init__ method to build two instances of Dice
When we roll each dice, the roll method will only affect the instance variable of that particular
instance
69
Perhaps the most difficult part of all of this notation is the self keyword in the Dice class definition
The simplest way to think of it is that self refers to a particular instance
If we want to refer to instance variables, as opposed to class or global variables, then we need to
use self
In addition, we need to put self as the first argument to every method defined in the class
Further Details You might want to leave it at that for now, but if you still want to know more
about self, here goes
Consider the method call d.roll()
This is in fact translated by Python into the call Dice.roll(d)
So in fact we are calling method roll() defined in class object Dice with instance d as the argument
Hence, when roll() executes, self is bound to d
In this way, self.current_face = random.choice(faces) affects d.current_face, which is what
we want
Example 2: The Quadratic Map Lets look at one more example
The quadratic map difference equation is given by
x t +1 = 4 (1 x t ) x t ,
x0 [0, 1] given
(1.2)
Lets write a class for generating time series, where the data record the current location of the state
xt
Heres one implementation, in file examples/quadmap_class.py
"""
Filename: quadmap_class.py
Authors: John Stachurski, Thomas J. Sargent
"""
class QuadMap(object):
def __init__(self, initial_state):
70
Special Methods
Python provides certain special methods with which a number of neat tricks can be performed
For example, recall that lists and tuples have a notion of length, and this length can be queried via
the len function
In [21]: x = (10, 20)
In [22]: len(x)
Out[22]: 2
If you want to provide a return value for the len function when applied to your user-defined
object, use the __len__ special method
class Foo:
def __len__(self):
return 42
Now we get
T HOMAS S ARGENT AND J OHN S TACHURSKI
71
72
In [33]: F(0.5)
Out[33]: 0.479
an x n
( x R)
(1.4)
n =0
The instance data for the class Polynomial will be the coefficients (in the case of (1.4), the numbers
a0 , . . . , a N )
Provide methods that
1. Evaluate the polynomial (1.4), returning p( x ) for any x
2. Differentiate the polynomial, replacing the original coefficients with those of its derivative
p0
Avoid using any import statements
Solutions
Solution notebook
Overview
The objective of the lecture is to provide deeper understanding of Pythons execution model
Understanding these details is important for writing larger programs
You should feel free to skip this material on first pass and continue on to the applications
T HOMAS S ARGENT AND J OHN S TACHURSKI
73
We provide this material mainly as a reference, and for returning to occasionally to build your
Python skills
Objects
creates (an instance of) a list, possessing various methods (append, pop, etc.)
In Python everything in memory is treated as an object
This includes not just lists, strings, etc., but also less obvious things, such as
functions (once they have been read into memory)
modules (ditto)
files opened for reading or writing
integers, etc.
At this point it is helpful to have a clearer idea of what an object is in Python
In Python, an object is a collection of data and instructions held in computer memory that consists
of
1. a type
2. some content
3. a unique identity
4. zero or more methods
These concepts are discussed sequentially in the remainder of this section
Type Python understands and provides for different types of objects, to accommodate different
types of data
The type of an object can be queried via type(object_name)
For example
In [2]: s = 'This is a string'
In [3]: type(s)
Out[3]: str
In [4]: x = 42
74
In [5]: type(x)
Out[5]: int
Here we are mixing types, and its unclear to Python whether the user wants to
convert 300 to an integer and then add it to 400, or
convert 400 to string and then concatenate it with 300
Some languages might try to guess but Python is strongly typed
Type is important, and implicit type conversion is rare
Python will respond instead by raising a TypeError
--------------------------------------------------------------------------TypeError
Traceback (most recent call last)
<ipython-input-9-9b7dffd27f2d> in <module>()
----> 1 '300' + 400
TypeError: cannot concatenate 'str' and 'int' objects
To avoid the error, you need to clarify by changing the relevant type
For example,
In [9]: int('300') + 400
Out[9]: 700
75
When Python creates this integer object, it stores with it various auxiliary information, such as the
imaginary part, and the type
As discussed previously, any name following a dot is called an attribute of the object to the left of
the dot
For example, imag and __class__ are attributes of x
Identity In Python, each object has a unique identifier, which helps Python (and us) keep track
of the object
The identity of an object can be obtained via the id() function
In [14]: y = 2.5
In [15]: z = 2.5
In [16]: id(y)
Out[16]: 166719660
In [17]: id(z)
Out[17]: 166719740
In this example, y and z happen to have the same value (i.e., 2.5), but they are not the same object
The identity of an object is in fact just the address of the object in memory
Methods As discussed earlier, methods are functions that are bundled with objects
Formally, methods are attributes of objects that are callable (i.e., can be called as functions)
In [18]: x = ['foo', 'bar']
In [19]: callable(x.append)
Out[19]: True
In [20]: callable(x.__doc__)
Out[20]: False
Methods typically act on the data contained in the object they belong to, or combine that data with
other data
In [21]: x = ['a', 'b']
In [22]: x.append('c')
In [23]: s = 'This is a string'
76
In [24]: s.upper()
Out[24]: 'THIS IS A STRING'
In [25]: s.lower()
Out[25]: 'this is a string'
In [26]: s.replace('This', 'That')
Out[26]: 'That is a string'
In [29]: x
Out[29]: ['aa', 'b']
It doesnt look like there are any methods used here, but in fact the square bracket assignment
notation is just a convenient interface to a method call
What actually happens is that Python calls the __setitem__ method, as follows
In [30]: x = ['a', 'b']
In [31]: x.__setitem__(0, 'aa')
In [32]: x
Out[32]: ['aa', 'b']
(If you wanted to you could modify the __setitem__ method, so that square bracket assignment
does something totally different)
Everything is an Object Above we said that in Python everything is an objectlets look at this
again
Consider, for example, functions
When Python reads a function definition, it creates a function object and stores it in memory
The following code illustrates
In [33]: def f(x): return x**2
In [34]: f
Out[34]: <function __main__.f>
In [35]: type(f)
Out[35]: function
In [36]: id(f)
77
We can see that f has type, identity, attributes and so onjust like any other object
Likewise modules loaded into memory are treated as objects
In [38]: import math
In [39]: id(math)
Out[39]: 3074329380L
This uniform treatment of data in Python (everything is an object) helps keep the language simple
and consistent
Iterables and Iterators
We see that file objects do indeed have a next method, and that calling this method returns the
next line in the file
The objects returned by enumerate() are also iterators
In [43]: e = enumerate(['foo', 'bar'])
In [44]: e.next()
Out[44]: (0, 'foo')
78
In [45]: e.next()
Out[45]: (1, 'bar')
In [55]: webpage.next()
Out[55]: '<meta name="Description" content="CNN.com delivers the latest breaking news and information..'
Iterators in For Loops All iterators can be placed to the right of the in keyword in for loop
statements
In fact this is how the for loop works: If we write
for x in iterator:
<code block>
79
80
One thing to remember about iterators is that they are depleted by use
In [72]: x = [10, -10]
In [73]: y = iter(x)
In [74]: max(y)
Out[74]: 10
In [75]: max(y)
--------------------------------------------------------------------------ValueError
Traceback (most recent call last)
81
We now know that when this statement is executed, Python creates an object of type int in your
computers memory, containing
the value 42
some associated attributes
But what is x itself?
In Python, x is called a name, and the statement x = 42 binds the name x to the integer object we
have just discussed
Under the hood, this process of binding names to objects is implemented as a dictionarymore
about this in a moment
There is no problem binding two or more names to the one object, regardless of what that object is
In [77]: def f(string):
....:
print(string)
In [78]: g = f
In [79]: id(g) == id(f)
Out[79]: True
In [80]: g('test')
Out[80]: test
In the first step, a function object is created, and the name f is bound to it
After binding the name g to the same object, we can use it anywhere we would use f
What happens when the number of names bound to an object goes to zero?
Heres an example of this situation, where the name x is first bound to one object and then rebound
to another
In [81]: x = 'foo'
In [82]: id(x)
Out[82]: 164994764
In [83]: x = 'bar'
82
What happens here is that the first object, with identity 164994764 is garbage collected
In other words, the memory slot that stores that object is deallocated, and returned to the operating
system
Namespaces Recall from the preceding discussion that the statement
In [84]: x = 42
Next lets import the math module from the standard library
In [86]: import math
These two different bindings of pi exist in different namespaces, each one implemented as a dictionary
We can look at the dictionary directly, using module_name.__dict__
In [89]: import math
In [90]: math.__dict__
Out[90]: {'pow': <built-in function pow>, ..., 'pi': 3.1415926535897931,...}
# Edited output
83
# Edited output
As you know, we access elements of the namespace using the dotted attribute notation
In [93]: math.pi
Out[93]: 3.1415926535897931
Viewing Namespaces As we saw above, the math namespace can be printed by typing
math.__dict__
Another way to see its contents is to type vars(math)
In [95]: vars(math)
Out[95]: {'pow': <built-in function pow>,...
Interactive Sessions In Python, all code executed by the interpreter runs in some module
What about commands typed at the prompt?
These are also regarded as being executed within a module in this case, a module called
__main__
To check this, we can look at the current module name via the value of __name__ given at the
prompt
In [99]: print(__name__)
__main__
84
When we run a script using IPythons run command, the contents of the file are executed as part
of __main__ too
To see this, lets create a file mod.py that prints its own __name__ attribute
# Filename: mod.py
print(__name__)
# Standard import
# Run interactively
In the second case, the code is executed as part of __main__, so __name__ is equal to __main__
To see the contents of the namespace of __main__ we use vars() rather than vars(__main__)
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has initialized when you started up your session
If you prefer to see only the variables you have initialized, use whos
In [3]: x = 2
In [4]: y = 3
In [5]: import numpy as np
In [6]: whos
Variable
Type
Data/Info
-----------------------------np
module
<module 'numpy' from '/us<...>ages/numpy/__init__.pyc'>
x
int
2
y
int
3
The Global Namespace Python documentation often makes reference to the global namespace
The global namespace is the namespace of the module currently being executed
For example, suppose that we start the interpreter and begin making assignments
We are now working in the module __main__, and hence the namespace for __main__ is the global
namespace
Next, we import a module called amodule
In [7]: import amodule
At this point, the interpreter creates a namespace for the module amodule and starts executing
commands in the module
85
# Edited output
In [13]: dir(__builtins__)
Out[13]: [... 'iter', 'len', 'license', 'list', 'locals', ...]
# Edited output
But __builtins__ is special, because we can always access them directly as well
86
Here f is the enclosing function for g, and each function gets its own namespaces
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is
1. the local namespace (if it exists)
2. the hierarchy of enclosing namespaces (if they exist)
3. the global namespace
4. the builtin namespace
If the name is not in any of these namespaces, the interpreter raises a NameError
This is called the LEGB rule (local, enclosing, global, builtin)
87
11
In [18]: x
--------------------------------------------------------------------------NameError
Traceback (most recent call last)
<ipython-input-2-401b30e3b8b5> in <module>()
----> 1 x
NameError: name 'x' is not defined
First,
The global namespace {} is created
The function object is created, and g is bound to it within the global namespace
The name a is bound to 0, again in the global namespace
Next g is called via y = g(10), leading to the following sequence of actions
The local namespace for the function is created
Local names x and a are bound, so that the local namespace becomes {x:
10, a:
1}
Statement x = x + a uses the local a and local x to compute x + a, and binds local name x
to the result
This value is returned, and y is bound to it in the global namespace
Local x and a are discarded (and the local namespace is deallocated)
Note that the global a was not affected by the local a
Mutable Versus Immutable Parameters This is a good time to say a little more about mutable
vs immutable objects
Consider the code segment
def f(x):
x = x + 1
return x
88
x = 1
print f(x), x
We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as the
value of x
First f and x are registered in the global namespace
The call f(x) creates a local namespace and adds x to it, bound to 1
Next, this local x is rebound to the new integer object 2, and this value is returned
None of this affects the global x
However, its a different story when we use a mutable data type such as a list
def f(x):
x[0] = x[0] + 1
return x
x = [1]
print f(x), x
89
Contents
More Language Features
Overview
Handling Errors
Decorators and Descriptors
Generators
Recursive Function Calls
Exercises
Solutions
Overview
As with the last lecture, our advice is to skip this lecture on first pass, unless you have a burning
desire to read it
Its here
1. as a reference, so we can link back to it when required, and
2. for those who have worked through a number of applications, and now want to learn more
about the Python language
A variety of topics are treated in the lecture, including generators, exceptions and descriptors
Handling Errors
n
1
(yi y )2
n 1 i =1
y = sample mean
90
Because its reduces confidence in your code on the part of your users
Hence its usually best to add code to your program that deals with errors as they occur
Assertions One of the easiest ways to handle these kinds of problems is with the assert keyword
For example, pretend for a moment that the np.var function doesnt exist and we need to write
our own
In [19]: def var(y):
....:
n = len(y)
....:
assert n > 1, 'Sample size must be greater than one.'
....:
return np.sum((y - y.mean())**2) / float(n-1)
....:
If we run this with an array of length one, the program will terminate and print our error message
In [20]: var([1])
--------------------------------------------------------------------------AssertionError
Traceback (most recent call last)
<ipython-input-20-0032ff8a150f> in <module>()
----> 1 var([1])
<ipython-input-19-cefafaec3555> in var(y)
1 def var(y):
2
n = len(y)
----> 3
assert n > 1, 'Sample size must be greater than one.'
4
return np.sum((y - y.mean())**2) / float(n-1)
AssertionError: Sample size must be greater than one.
91
Since illegal syntax cannot be executed, a syntax error terminates execution of the program
Heres a different kind of error, unrelated to syntax
In [44]: 1 / 0
--------------------------------------------------------------------------ZeroDivisionError
Traceback (most recent call last)
<ipython-input-17-05c9758a9c21> in <module>()
----> 1 1/0
ZeroDivisionError: integer division or modulo by zero
Heres another
In [45]: x1 = y1
--------------------------------------------------------------------------NameError
Traceback (most recent call last)
<ipython-input-23-142e0509fbd6> in <module>()
----> 1 x1 = y1
NameError: name 'y1' is not defined
And another
In [46]: 'foo' + 6
--------------------------------------------------------------------------TypeError
Traceback (most recent call last)
<ipython-input-20-44bbe7e963e7> in <module>()
----> 1 'foo' + 6
TypeError: cannot concatenate 'str' and 'int' objects
And another
In [47]: X = []
In [48]: x = X[0]
--------------------------------------------------------------------------IndexError
Traceback (most recent call last)
<ipython-input-22-018da6d9fc14> in <module>()
----> 1 x = X[0]
IndexError: list index out of range
92
Returned None'
Returned None
In [52]: f(0.0)
Error: division by zero.
Returned None
Returned None
In [56]: f('foo')
Error: Unsupported operation.
Returned None
Returned None'
93
Returned None
In [60]: f('foo')
Error: Unsupported operation.
Returned None
Lets look at some special syntax elements that are routinely used by Python developers
You might not need the following concepts immediately, but you will see them in other peoples
code
Hence you need to understand them at some stage of your Python education
Decorators Decorators are a bit of syntactic sugar that, while easily avoided, have in fact turned
out to be rather popular
Its very easy to say what decorators do
On the other hand it takes a bit of effort to explain why you might use them
An Example Suppose we are working on a program that looks something like this
import numpy as np
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
# Program continues with various calculations using f and g
Now suppose theres a problem: occasionally negative numbers get fed to f and g in the calculations that follow
94
If you try it, youll see that when these functions are called with negative numbers they return a
NumPy object called nan
Suppose further that this is not what we want because it causes other problems that are hard to
pick up
Suppose that instead we want the program to terminate whenever this happens with a sensible
error message
This change is easy enough to implement
import numpy as np
def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))
def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)
# Program continues with various calculations using f and g
Notice however that there is some repetition here, in the form of two identical lines of code
Repetition makes our code longer and harder to maintain, and hence is something we try hard to
avoid
Here its not a big deal, but imagine now that instead of just f and g, we have 20 such functions
that we need to modify in exactly the same way
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20 times
The situation is still worse if the test logic is longer and more complicated
In this kind of scenario the following approach would be neater
import numpy as np
def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g
95
with
@check_nonneg
def f(x):
return np.log(np.log(x))
@check_nonneg
def g(x):
return np.sqrt(42 * x)
96
One potential problem we might have here is that a user alters one of these variables but not the
other
In [2]: car = Car()
In [3]: car.miles
Out[3]: 1000
In [4]: car.kms
Out[4]: 1610.0
In [5]: car.miles = 6000
In [6]: car.kms
Out[6]: 1610.0
In the last two lines we see that miles and kms are out of sync
What we really want is some mechanism whereby each time a user sets one of these variables, the
other is automatically updated
A Solution In Python, this issue is solved using descriptors
A descriptor is just a Python object that implements certain methods
These methods are triggered when the object is accessed through dotted attribute notation
The best way to understand this is to see it in action
Consider this alternative version of the Car class
class Car(object):
def __init__(self, miles=1000):
self._miles = miles
self._kms = miles * 1.61
def set_miles(self, value):
97
98
99
The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case
In [13]: sum(x * x for x in range(10))
Out[13]: 285
Generator Functions The most flexible way to create generator objects is to use generator functions
Lets look at some examples
Example 1 Heres a very simple example of a generator function
def f():
yield 'start'
yield 'middle'
yield 'end'
It looks like a function, but uses a keyword yield that we havent met before
Lets see how it works after running this code
In [15]: type(f)
Out[15]: function
In [16]: gen = f()
In [17]: gen
Out[17]: <generator object f at 0x3b66a50>
100
In [18]: gen.next()
Out[18]: 'start'
In [19]: gen.next()
Out[19]: 'middle'
In [20]: gen.next()
Out[20]: 'end'
In [21]: gen.next()
--------------------------------------------------------------------------StopIteration
Traceback (most recent call last)
<ipython-input-21-b2c61ce5e131> in <module>()
----> 1 gen.next()
StopIteration:
The generator function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next() method
The first call to gen.next()
Executes code in the body of f() until it meets a yield statement
Returns that value to the caller of gen.next()
The second call to gen.next() starts executing from the next line
def f():
yield 'start'
yield 'middle'
yield 'end'
# This line!
101
102
In [34]: sum(draws)
But we are creating two huge lists here, range(n) and draws
This uses lots of memory and is very slow
If we make n even bigger then this happens
In [35]: n = 1000000000
In [36]: draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
--------------------------------------------------------------------------MemoryError
Traceback (most recent call last)
<ipython-input-9-20d1ec1dae24> in <module>()
----> 1 draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
In summary, iterables
avoid the need to create big lists/tuples, and
provide a uniform interface to iteration that can be used transparently in for loops
Recursive Function Calls
This is not something that you will use every day, but it is still useful you should learn it at
some stage
Basically, a recursive function is a function that calls itself
For example, consider the problem of computing xt for some t when
xt+1 = 2xt ,
T HOMAS S ARGENT AND J OHN S TACHURSKI
x0 = 1
(1.5)
January 28, 2015
103
What happens here is that each successive call uses its own frame in the stack
a frame is where the local variables of a given function call are held
stack is memory used to process function calls
a First In Last Out (FILO) queue
This example is somewhat contrived, since the first (iterative) solution would usually be preferred
to the recursive solution
Well meet less contrived applications of recursion later on
Exercises
x0 = 0, x1 = 1
(1.6)
The first few numbers in the sequence are: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55
Write a function to recursively compute the t-th Fibonacci number for any t
Exercise 2 Complete the following code, and test it using this csv file, which we assume that
youve put in your current working directory
def column_iterator(target_file, column_number):
"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
that steps through the elements of column column_number in file
target_file.
"""
# put your code here
dates = column_iterator('test_table.csv', 1)
104
1.8. NUMPY
Exercise 3 Suppose we have a text file numbers.txt containing the following lines
prices
3
8
7
21
Using try except, write a program to read in the contents of the file and sum the numbers,
ignoring lines without numbers
Solutions
Solution notebook
1.8 NumPy
Contents
NumPy
Overview
Introduction to NumPy
NumPy Arrays
Operations on Arrays
Other NumPy Functions
Exercises
Solutions
Lets be clear: the work of science has nothing whatever to do with consensus. Consensus is the business of politics. Science, on the contrary, requires only one investigator who happens to be right, which means that he or she has results that are verifiable
by reference to the real world. In science consensus is irrelevant. What is relevant is
reproducible results. Michael Crichton
Overview
105
1.8. NUMPY
In this lecture we introduce the NumPy array data type and fundamental array processing operations
We assume that NumPy is installed on the machine you are usingsee this page for instructions
References
The official NumPy documentation
Introduction to NumPy
The most important thing that NumPy defines is an array data type formally called a
numpy.ndarray
For example, the np.zeros function returns an numpy.ndarray of zeros
In [1]: import numpy as np
In [2]: a = np.zeros(3)
In [3]: a
Out[3]: array([ 0.,
0.,
0.])
1.8. NUMPY
106
In [4]: type(a)
Out[4]: numpy.ndarray
NumPy arrays are somewhat like native Python lists, except that
Data must be homogeneous (all elements of the same type)
These types must be one of the data types (dtypes) provided by NumPy
The most important of these dtypes are:
float64: 64 bit floating point number
float32: 32 bit floating point number
int64: 64 bit integer
int32: 32 bit integer
bool: 8 bit True or False
There are also dtypes to represent complex numbers, unsigned integers, etc
On most machines, the default dtype for arrays is float64
In [7]: a = np.zeros(3)
In [8]: type(a[0])
Out[8]: numpy.float64
Here the shape tuple has only one element, which is the length of the array (tuples with one
element end with a comma)
To give it dimension, we can change the shape attribute
In [13]: z.shape = (10, 1)
In [14]: z
107
1.8. NUMPY
Out[14]:
array([[
[
[
[
[
[
[
[
[
[
0.],
0.],
0.],
0.],
0.],
0.],
0.],
0.],
0.],
0.]])
In [15]: z = np.zeros(4)
In [16]: z.shape = (2, 2)
In [17]: z
Out[17]:
array([[ 0.,
[ 0.,
0.],
0.]])
In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() function, as in
z = np.zeros((2, 2))
Creating Arrays As weve seen, the np.zeros function creates an array of zeros
You can probably guess what np.ones creates
Related is np.empty, which creates arrays in memory that can later be populated with data
In [18]: z = np.empty(3)
In [19]: z
Out[19]: array([
8.90030222e-307,
4.94944794e+173,
4.04144187e-262])
0.],
1.]])
In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array
T HOMAS S ARGENT AND J OHN S TACHURSKI
108
1.8. NUMPY
In [23]: z = np.array([10, 20])
In [24]: z
Out[24]: array([10, 20])
In [25]: type(z)
Out[25]: numpy.ndarray
In [26]: z = np.array((10, 20), dtype=float)
In [27]: z
Out[27]: array([ 10.,
20.])
In [29]: z
Out[29]:
array([[1, 2],
[3, 4]])
See also np.asarray, which performs a similar function, but does not make a distinct copy of data
already in a NumPy array
In [11]: na = np.linspace(10, 20, 2)
In [12]: na is np.asarray(na)
Out[12]: True
In [13]: na is np.array(na)
Out[13]: False
To read in the array data from a text file containing numeric data use np.loadtxt or
np.genfromtxtsee the documentation for details
Array Indexing For a flat array, indexing is the same as Python sequences:
In [30]: z = np.linspace(1, 2, 5)
In [31]: z
Out[31]: array([ 1.
1.25,
1.5 ,
1.75,
2.
])
In [32]: z[0]
Out[32]: 1.0
In [33]: z[0:2] # Slice numbering is left closed, right open
Out[33]: array([ 1. , 1.25])
In [34]: z[-1]
Out[34]: 2.0
109
1.8. NUMPY
In [35]: z = np.array([[1, 2], [3, 4]])
In [36]: z
Out[36]:
array([[1, 2],
[3, 4]])
In [37]: z[0, 0]
Out[37]: 1
In [38]: z[0, 1]
Out[38]: 2
And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows
In [39]: z[0,:]
Out[39]: array([1, 2])
In [40]: z[:,1]
Out[40]: array([2, 4])
2.5,
3. ,
3.5,
4. ])
3. ,
3.5])
2.5,
3. ,
3.5,
4. ])
True,
3. ])
110
1.8. NUMPY
In [49]: z = np.empty(3)
In [50]: z
Out[50]: array([ -1.25236750e-041,
0.00000000e+000,
5.45693855e-313])
In [51]: z[:] = 42
In [52]: z
Out[52]: array([ 42.,
42.,
42.])
Array Methods Arrays have useful methods, all of which are highly optimized
In [53]: A = np.array((4, 3, 2, 1))
In [54]: A
Out[54]: array([4, 3, 2, 1])
In [55]: A.sort()
# Sorts A in place
In [56]: A
Out[56]: array([1, 2, 3, 4])
In [57]: A.sum()
Out[57]: 10
# Sum
In [58]: A.mean()
Out[58]: 2.5
# Mean
In [59]: A.max()
Out[59]: 4
# Max
In [60]: A.argmax()
Out[60]: 3
In [61]: A.cumsum()
Out[61]: array([ 1,
3,
In [62]: A.cumprod()
Out[62]: array([ 1, 2,
In [63]: A.var()
Out[63]: 1.25
# Variance
In [64]: A.std()
Out[64]: 1.1180339887498949
# Standard deviation
# Equivalent to A.transpose()
111
1.8. NUMPY
2.5,
3. ,
3.5,
4. ])
In [69]: z.searchsorted(2.2)
Out[69]: 1
In [70]: z.searchsorted(2.5)
Out[70]: 1
In [71]: z.searchsorted(2.6)
Out[71]: 2
Many of the methods discussed above have equivalent functions in the NumPy namespace
In [72]: a = np.array((4, 3, 2, 1))
In [73]: np.sum(a)
Out[73]: 10
In [74]: np.mean(a)
Out[74]: 2.5
Operations on Arrays
Algebraic Operations The algebraic operators +, -, *, / and ** all act elementwise on arrays
In [75]: a = np.array([1, 2, 3, 4])
In [76]: b = np.array([5, 6, 7, 8])
In [77]: a + b
Out[77]: array([ 6,
8, 10, 12])
In [78]: a * b
Out[78]: array([ 5, 12, 21, 32])
112
1.8. NUMPY
In [82]: a * 10
Out[82]: array([10, 20, 30, 40])
With np.dot we can also take the inner product of two flat arrays
In [91]: A = np.array([1, 2])
In [92]: B = np.array([10, 20])
In [93]: np.dot(A, B)
Out[93]: 50
In fact we can use dot when one element is a Python list or tuple
In [94]: A = np.empty((2, 2))
In [95]: A
Out[95]:
array([[ 3.48091887e-262,
1.14802984e-263],
113
1.8. NUMPY
[
3.61513512e-313,
-1.25232371e-041]])
-1.25232371e-041])
True], dtype=bool)
In [100]: y[0] = 5
In [101]: z == y
Out[101]: array([False,
True], dtype=bool)
In [102]: z != y
Out[102]: array([ True, False], dtype=bool)
0. ,
2.5,
5. ,
7.5,
In [105]: z > 3
Out[105]: array([False, False,
True,
True,
10. ])
True], dtype=bool)
True,
In [108]: z[b]
Out[108]: array([
10. ])
5. ,
7.5,
True,
True], dtype=bool)
114
1.8. NUMPY
7.5,
10. ])
Vectorized Functions NumPy provides versions of the standard functions log, exp, sin, etc. that
act elementwise on arrays
In [110]: z = np.array([1, 2, 3])
In [111]: np.sin(z)
Out[111]: array([ 0.84147098,
0.90929743,
0.14112001])
Because they act elementwise on arrays, these functions are called vectorized functions
In NumPy-speak, they are also called ufuncs, which stands for universal functions
As we saw above, the usual arithmetic operations (+, *, etc.) also work elementwise, and combining these with the ufuncs gives a very large set of fast elementwise functions
In [112]: z
Out[112]: array([1, 2, 3])
In [113]: (1 / np.sqrt(2 * np.pi)) * np.exp(- 0.5 * z**2)
Out[113]: array([ 0.24197072, 0.05399097, 0.00443185])
In this situation you should use the vectorized NumPy function np.where
In [114]: import numpy as np
In [115]: x = np.random.randn(4)
In [116]: x
Out[116]: array([-0.25521782,
])
Although its usually better to hand code vectorized functions from vectorized NumPy operations,
at a pinch you can use np.vectorize
115
1.8. NUMPY
In [118]: def f(x): return 1 if x > 0 else 0
In [119]: f = np.vectorize(f)
In [120]: f(x)
# Passing same vector x as previous example
Out[120]: array([0, 1, 0, 0])
In [133]: np.linalg.inv(A)
Out[133]:
array([[-2. , 1. ],
[ 1.5, -0.5]])
In [134]: Z = np.random.randn(10000)
In [136]: y.mean()
Out[136]: 5.0369999999999999
However, all of this functionality is also available in SciPy, a collection of modules that build on
top of NumPy
Well cover the SciPy versions in more detail soon
Exercises
an x n
(1.7)
n =0
Earlier, you wrote a simple function p(x, coeff) to evaluate (1.7) without considering efficiency
Now write a new function that does the same job, but uses NumPy arrays and array operations
for its computations, rather than any form of Python loop
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise dont use
this class)
Hint: Use np.cumprod()
T HOMAS S ARGENT AND J OHN S TACHURSKI
116
1.8. NUMPY
If you cant see how this works, try thinking through the flow for a simple example, such as q =
[0.25, 0.75] It helps to sketch the intervals on paper
Your exercise is to speed it up using NumPy, avoiding explicit loops
Hint: Use np.searchsorted and np.cumsum
If you can, implement the functionality as a class called discreteRV, where
the data for an instance of the class is the vector of probabilities q
the class has a draw() method, which returns one draw according to the algorithm described
above
If you can, write the method so that draw(k) returns k draws from q
Exercise 3 Recall our earlier discussion of the empirical distribution function
Your task is to
1. Make the __call__ method more efficient using NumPy
2. Add a method that plots the ECDF over [ a, b], where a and b are method parameters
Solutions
Solution notebook
117
1.9. SCIPY
1.9 SciPy
Contents
SciPy
SciPy versus NumPy
Statistics
Roots and Fixed Points
Optimization
Integration
Linear Algebra
Exercises
Solutions
SciPy builds on top of NumPy to provide common tools for scientific programming, such as
linear algebra
numerical integration
interpolation
optimization
distributions and random number generation
signal processing
etc., etc
Like NumPy, SciPy is stable, mature and widely used
Many SciPy routines are thin wrappers around industry-standard Fortran libraries such as LAPACK, BLAS, etc.
Its not really necessary to learn SciPy as a wholea better approach is to learn each relevant
feature as required
You can browse from the top of the documentation tree to see whats available
In this lecture we aim only to highlight some useful parts of the package
SciPy versus NumPy
SciPy is a package that contains various tools that are built on top of NumPy, using its array data
type and related functionality
In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initialization file
# Import numpy symbols to scipy name space
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
118
1.9. SCIPY
from numpy.lib.scimath import *
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
Although SciPy imports NumPy, the standard approach is to start scientific programs with
import numpy as np
This approach helps clarify what functionality belongs to what package, and we will follow it in
these lectures
Statistics
0.32346476])
(0 x 1)
(1.8)
1.9. SCIPY
119
Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this we can use scipy.stats, which provides all of this functionality as well as random number generation in a single consistent interface
Heres an example of usage
import numpy as np
from scipy.stats import beta
from matplotlib.pyplot import hist, plot, show
q = beta(5, 5)
# Beta(a, b), with a = b = 5
obs = q.rvs(2000)
# 2000 observations
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, q.pdf(grid), 'k-', linewidth=2)
show()
In this code we created a so-called rv_frozen object, via the call q = beta(5, 5)
The frozen part of the notation related to the fact that q represents a particular distribution with
a particular set of parameters
Once weve done so, we can then generate random numbers, evaluate the density, etc., all from
this fixed distribution
T HOMAS S ARGENT AND J OHN S TACHURSKI
120
1.9. SCIPY
In [14]: q.cdf(0.4)
# Cumulative distribution function
Out[14]: 0.2665676800000002
In [15]: q.pdf(0.4)
# Density function
Out[15]: 2.0901888000000004
In [16]: q.ppf(0.8)
# Quantile (inverse cdf) function
Out[16]: 0.63391348346427079
In [17]: q.mean()
Out[17]: 0.5
Other Goodies in scipy.stats There are also many statistical functions in scipy.stats
For example, scipy.stats.linregress implements simple linear regression
In [19]: from scipy.stats import linregress
In [20]: x = np.random.randn(200)
In [21]: y = 2 * x + 0.1 * np.random.randn(200)
In [22]: gradient, intercept, r_value, p_value, std_err = linregress(x, y)
In [23]: gradient, intercept
Out[23]: (1.9962554379482236, 0.008172822032671799)
121
1.9. SCIPY
(1.9)
122
1.9. SCIPY
In fact SciPy provides its own bisection function, which we now test using the function f defined
in (1.9)
In [24]: from scipy.optimize import bisect
In [25]: f = lambda x: np.sin(4 * (x - 0.25)) + x + x**20 - 1
In [26]: bisect(f, 0, 1)
Out[26]: 0.40829350427936706
The Newton-Raphson Method Another very common root-finding algorithm is the NewtonRaphson method
In SciPy this algorithm is implemented by scipy.newton
Unlike bisection, the Newton-Raphson method uses local slope information
This is a double-edged sword:
When the function is well-behaved, the Newton-Raphson method is faster than bisection
When the function is less well-behaved, the Newton-Raphson might fail
Lets investigate this using the same function f , first looking at potential instability
In [27]: from scipy.optimize import newton
In [28]: newton(f, 0.2)
# Start the search at initial condition x = 0.2
Out[28]: 0.40829350427935679
In [29]: newton(f, 0.7)
# Start the search at x = 0.7 instead
Out[29]: 0.70017000000002816
123
1.9. SCIPY
Hybrid Methods So far we have seen that the Newton-Raphson method is fast but not robust
This bisection algorithm is robust but relatively slow
This illustrates a general principle
If you have specific knowledge about your function, you might be able to exploit it to generate efficiency
If not, then algorithm choice involves a trade-off between speed of convergence and robustness
In practice, most default algorithms for root finding, optimization and fixed points use hybrid
methods
These methods typically combine a fast method with a robust method in the following manner:
1. Attempt to use a fast method
2. Check diagnostics
3. If diagnostics are bad, then switch to a more robust algorithm
In scipy.optimize, the function brentq is such a hybrid method, and a good default
In [35]: brentq(f, 0, 1)
Out[35]: 0.40829350427936706
In [36]: timeit brentq(f, 0, 1)
10000 loops, best of 3: 63.2 us per loop
Here the correct solution is found and the speed is almost the same as newton
Multivariate Root Finding Use scipy.optimize.fsolve, a wrapper for a hybrid method in
MINPACK
See the documentation for details
Fixed Points SciPy has a function for finding (scalar) fixed points too
In [1]: from scipy.optimize import fixed_point
In [2]: fixed_point(lambda x: x**2, 10.0)
Out[2]: 1.0
124
1.9. SCIPY
If you dont get good results, you can always switch back to the brentq root finder, since the fixed
point of a function f is the root of g( x ) := x f ( x )
Optimization
# Search in [-1, 2]
Most numerical integration methods work by computing the integral of an approximating polynomial
The resulting error depends on how well the polynomial fits the integrand, which in turn depends
on how regular the integrand is
In SciPy, the relevant module for numerical integration is scipy.integrate
A good default for univariate integration is quad
In [13]: from scipy.integrate import quad
In [14]: integral, error = quad(lambda x: x**2, 0, 1)
In [15]: integral
Out[15]: 0.33333333333333337
In fact quad is an interface to a very standard numerical integration routine in the Fortran library
QUADPACK
T HOMAS S ARGENT AND J OHN S TACHURSKI
125
1.9. SCIPY
We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
The latter is not an exact superset of the former, but overall it has more functionality
We leave you to investigate the set of available routines
Exercises
Exercise 1 Recall that we previously discussed the concept of recusive function calls
Write a recursive implementation of the bisection function described above, which we repeat here
for convenience
def bisect(f, a, b, tol=10e-5):
"""
Implements the bisection root finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b
while upper - lower > tol:
middle = 0.5 * (upper + lower)
# === if root is between lower and middle === #
if f(middle) > 0:
lower, upper = lower, middle
# === if root is between middle and upper === #
else:
lower, upper = middle, upper
return 0.5 * (upper + lower)
Solutions
Solution notebook
126
1.10. MATPLOTLIB
1.10 Matplotlib
Contents
Matplotlib
Overview
A Simple API
The Object-Oriented API
More Features
Further Reading
Overview
Weve already generated quite a few figures in these lectures using Matplotlib
Matplotlib is an outstanding graphics library, designed for scientific computing, with
high quality 2D and 3D plots
output in all the usual formats (PDF, PNG, etc.)
LaTeX integration
animation, etc., etc.
A Simple API
Matplotlib is very easy to get started with, thanks to its simple MATLAB-style API (Application
Progamming Interface)
Heres the kind of easy example you might find in introductory treatments
from pylab import * # Depreciated
x = linspace(0, 10, 200)
y = sin(x)
plot(x, y, 'b-', linewidth=2)
show()
127
1.10. MATPLOTLIB
Also, from pylab import * pulls lots of names into the global namespace, which is a potential
source of name conflicts
An better syntax would be
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 200)
y = np.sin(x)
plt.plot(x, y, 'b-', linewidth=2)
plt.show()
The API described above is simple and convenient, but also a bit limited and somewhat unPythonic
For example, in the function calls a lot of objects get created and passed around without making
themselves known to the programmer
Python programmers tend to prefer a more explicit style of programming (type import this in
the IPython (or Python) shell and look at the second line)
This leads us to the alternative, object oriented Matplotlib API
Heres the code corresponding to the preceding figure using the object oriented API:
128
1.10. MATPLOTLIB
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
x = np.linspace(0, 10, 200)
y = np.sin(x)
ax.plot(x, y, 'b-', linewidth=2)
plt.show()
While theres a bit more typing, the more explicit use of objects gives us more fine-grained control
This will become more clear as we go along
Incidentally, regarding the above lines of code,
the form of the import statement import matplotlib.pyplot as plt is standard
Here the call fig, ax = plt.subplots() returns a pair, where
fig is a Figure instancelike a blank canvas
ax is an AxesSubplot instancethink of a frame for plotting in
The plot() function is actually a method of ax
Tweaks Here weve changed the line to red and added a legend
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
x = np.linspace(0, 10, 200)
y = np.sin(x)
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend()
plt.show()
129
1.10. MATPLOTLIB
Weve also used alpha to make the line slightly transparentwhich makes it look smoother
Unfortunately the legend is obscuring the line
This can be fixed by replacing ax.legend() with ax.legend(loc=upper center)
1.10. MATPLOTLIB
130
The r in front of the label string tells Python that this is a raw string
The figure now looks as follows
1.10. MATPLOTLIB
131
132
1.10. MATPLOTLIB
More Features
Matplotlib has a huge array of functions and features, which you can discover over time as you
have need for them
We mention just a few
Multiple Plots on One Axis Its straightforward to generate mutiple plots on the same axes
Heres an example that randomly generates three normal densities and adds a label with their
mean
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from random import uniform
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = r'$\mu = {0:.2f}$'.format(m)
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()
1.10. MATPLOTLIB
133
1.10. MATPLOTLIB
134
In fact the preceding figure was generated by the code above preceded by the following three lines
1.10. MATPLOTLIB
135
Depending on your LaTeX installation, this may or may not work for you try experimenting
and see how you go
3D Plots Matplotlib does a nice job of 3D plots here is one example
1.10. MATPLOTLIB
136
y,
f(x, y),
rstride=2, cstride=2,
cmap=cm.jet,
alpha=0.7,
linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
plt.show()
A Customizing Function Perhaps you will find a set of customizations that you regularly use
Suppose we usually prefer our axes to go through the origin, and to have a grid
Heres a nice example from this blog of how the object-oriented API can be used to build a custom
subplots function that implements these changes
Read carefully through the code and see if you can follow whats going on
import matplotlib.pyplot as plt
import numpy as np
def subplots():
"Custom subplots with axes throught the origin"
fig, ax = plt.subplots()
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.grid()
return fig, ax
fig, ax = subplots() # Call the local version, not plt.subplots()
x = np.linspace(-2, 10, 200)
y = np.sin(x)
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend(loc='lower right')
plt.show()
Heres the figure it produces (note axes through the origin and the grid)
137
1.11. PANDAS
1.11 Pandas
138
1.11. PANDAS
Contents
Pandas
Overview
Series
DataFrames
On-Line Data Sources
Exercises
Solutions
Overview
Wikipedia defines munging as cleaning data from one raw form into a structured, purged one.
139
1.11. PANDAS
Series
Perhaps the two most important data types defined by pandas are the DataFrame and Series types
You can think of a Series as a column of data, such as a collection of observations on a single
variable
In [4]: s = pd.Series(np.random.randn(4), name='daily returns')
In [5]: s
Out[5]:
0
0.430271
1
0.617328
2
-0.265421
3
-0.836113
Name: daily returns
Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the values
being daily returns on their shares
Pandas Series are built on top of NumPy arrays, and support many similar operations
In [6]: s * 100
Out[6]:
0
43.027108
1
61.732829
2
-26.542104
3
-83.611339
Name: daily returns
In [7]: np.abs(s)
Out[7]:
0
0.430271
1
0.617328
2
0.265421
3
0.836113
Name: daily returns
s.describe()
4.000000
-0.013484
0.667092
-0.836113
-0.408094
0.082425
0.477035
0.617328
140
1.11. PANDAS
In [9]: s.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']
In [10]: s
Out[10]:
AMZN
0.430271
AAPL
0.617328
MSFT
-0.265421
GOOG
-0.836113
Name: daily returns
Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction that the
items in the dictionary all have the same typein this case, floats)
In fact you can use much of the same syntax as Python dictionaries
In [11]: s['AMZN']
Out[11]: 0.43027108469945924
In [12]: s['AMZN'] = 0
In [13]: s
Out[13]:
AMZN
0.000000
AAPL
0.617328
MSFT
-0.265421
GOOG
-0.836113
Name: daily returns
In [14]: 'AAPL' in s
Out[14]: True
DataFrames
As mentioned above a DataFrame is somewhat like a spreadsheet, or a structure for storing the
data matrix in a regression
While a Series is one individual column of data, a DataFrame is all the columns
Lets look at an example, reading in data from the CSV file data/test_pwt.csv in the main repository
Heres the contents of test_pwt.csv, which is a small excerpt from the Penn World Tables
"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5788042896"
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7200975332"
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.072205773"
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.266688415"
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","11.658954494"
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427","5.7265463933"
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.0324539789"
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","5.108067988"
141
1.11. PANDAS
Here were in IPython, so we have access to shell commands such as ls, as well as the usual
Python commands
In [15]: ls data/test_pw*
test_pwt.csv
# List all files starting with 'test_pw' -- check CSV file is in present work
year
2000
2000
2000
2000
2000
2000
2000
2000
POP
37335.653
19053.186
1006300.297
6114.570
11801.505
45064.098
282171.957
3219.793
XRAT
0.999500
1.724830
44.941600
4.077330
59.543808
6.939830
1.000000
12.099592
tcgdp
295072.218690
541804.652100
1728144.374800
129253.894230
5026.221784
227242.369490
9898700.000000
25255.961693
cc
0
1
2
3
4
5
6
7
cg
75.716805
67.759026
64.575551
64.436451
74.707624
72.718710
72.347054
78.978740
We can select particular rows using standard Python array slicing notation
In [13]: df[2:5]
Out[13]:
country country isocode
2
India
IND
3 Israel
ISR
4 Malawi
MWI
year
2000
2000
2000
POP
1006300.297
6114.570
11801.505
XRAT
44.941600
4.077330
59.543808
tcgdp
1728144.374800
129253.894230
5026.221784
cc
64.575551
64.436451
74.707624
cg
14.072206
10.266688
11.658954
To select columns, we can pass a list containing the names of the desired columns represented as
strings
In [14]: df[['country', 'tcgdp']]
Out[14]:
country
tcgdp
0
Argentina
295072.218690
1
Australia
541804.652100
2
India 1728144.374800
3
Israel
129253.894230
4
Malawi
5026.221784
5
South Africa
227242.369490
6 United States 9898700.000000
7
Uruguay
25255.961693
5.578804
6.720098
14.072206
10.266688
11.658954
5.726546
6.032454
5.108068
142
1.11. PANDAS
2
3
4
5
India
Israel
Malawi
South Africa
1728144.374800
129253.894230
5026.221784
227242.369490
Lets imagine that were only interested in population and total GDP (tcgdp)
One way to strip the data frame df down to only these variables is as follows
In [31]: keep = ['country', 'POP', 'tcgdp']
In [32]: df = df[keep]
In [33]: df
Out[33]:
country
0
Argentina
1
Australia
2
India
3
Israel
4
Malawi
5
South Africa
6 United States
7
Uruguay
POP
37335.653
19053.186
1006300.297
6114.570
11801.505
45064.098
282171.957
3219.793
tcgdp
295072.218690
541804.652100
1728144.374800
129253.894230
5026.221784
227242.369490
9898700.000000
25255.961693
Here the index 0, 1,..., 7 is redundant, because we can use the country names as an index
To do this, first lets pull out the country column using the pop method
In [34]: countries = df.pop('country')
In [35]: type(countries)
Out[35]: pandas.core.series.Series
In [36]: countries
Out[36]:
0
Argentina
1
Australia
2
India
3
Israel
4
Malawi
5
South Africa
6
United States
7
Uruguay
Name: country
In [37]: df
Out[37]:
POP
0
37335.653
1
19053.186
2 1006300.297
3
6114.570
4
11801.505
tcgdp
295072.218690
541804.652100
1728144.374800
129253.894230
5026.221784
143
1.11. PANDAS
5
6
7
45064.098
282171.957
3219.793
227242.369490
9898700.000000
25255.961693
POP
tcgdp
37335.653
19053.186
1006300.297
6114.570
11801.505
45064.098
282171.957
3219.793
295072.218690
541804.652100
1728144.374800
129253.894230
5026.221784
227242.369490
9898700.000000
25255.961693
population
total GDP
37335.653
19053.186
1006300.297
6114.570
11801.505
45064.098
282171.957
3219.793
295072.218690
541804.652100
1728144.374800
129253.894230
5026.221784
227242.369490
9898700.000000
25255.961693
population
37335653
19053186
1006300297
6114570
11801505
45064098
282171957
3219793
total GDP
295072.218690
541804.652100
1728144.374800
129253.894230
5026.221784
227242.369490
9898700.000000
25255.961693
Next were going to add a column showing real GDP per capita, multiplying by 1,000,000 as we
T HOMAS S ARGENT AND J OHN S TACHURSKI
144
1.11. PANDAS
population
total GDP
GDP percap
37335653
19053186
1006300297
6114570
11801505
45064098
282171957
3219793
295072.218690
541804.652100
1728144.374800
129253.894230
5026.221784
227242.369490
9898700.000000
25255.961693
7903.229085
28436.433261
1717.324719
21138.672749
425.896679
5042.647686
35080.381854
7843.970620
One of the nice things about pandas DataFrame and Series objects is that they have methods for
plotting and visualization that work through Matplotlib
For example, we can easily generate a bar plot of GDP per capita
In [76]: df['GDP percap'].plot(kind='bar')
Out[76]: <matplotlib.axes.AxesSubplot at 0x2f22ed0>
In [77]: import matplotlib.pyplot as plt
In [78]: plt.show()
population
total GDP
GDP percap
282171957
19053186
6114570
37335653
3219793
45064098
1006300297
11801505
9898700.000000
541804.652100
129253.894230
295072.218690
25255.961693
227242.369490
1728144.374800
5026.221784
35080.381854
28436.433261
21138.672749
7903.229085
7843.970620
5042.647686
1717.324719
425.896679
1.11. PANDAS
145
1.11. PANDAS
146
147
1.11. PANDAS
On-Line Data Sources
148
1.11. PANDAS
Out[39]: '<!DOCTYPE HTML>\n'
In [40]: web_page.next()
Out[40]: '<html lang="en-US">\n'
The next method returns successive lines from the file returned by CNNs web server in this
case the top level HTML page at the site cnn.com
Other methods include read, readline, readlines, etc.
The same idea can be used to access the CSV file discussed above
In [56]: url = 'http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv'
In [57]: source = urllib2.urlopen(url)
In [58]: source.next()
Out[58]: 'DATE,VALUE\r\n'
In [59]: source.next()
Out[59]: '1948-01-01,3.4\r\n'
In [60]: source.next()
Out[60]: '1948-02-01,3.8\r\n'
We could now write some additional code to parse this text and store it as an array...
But this is unnecessary pandas read_csv function can handle the task for us
In [69]: source = urllib2.urlopen(url)
In [70]: data = pd.read_csv(source, index_col=0, parse_dates=True, header=None)
The data has been read into a pandas DataFrame called data that we can now manipulate in the
usual way
In [71]: type(data)
Out[71]: pandas.core.frame.DataFrame
In [72]: data.head()
Out[72]:
1
0
DATE
VALUE
1948-01-01
3.4
1948-02-01
3.8
1948-03-01
4.0
1948-04-01
3.9
In [73]: data.describe()
Out[73]:
1
count
786
unique
81
149
1.11. PANDAS
top
freq
5.4
31
Accessing Data with pandas Although it is worth understanding the low level procedures, for
the present case pandas can take care of all these messy details
(pandas puts a simple API (Application Progamming Interface) on top of the kind of low level
function calls weve just covered)
For example, we can obtain the same unemployment data for the period 20062012 inclusive as
follows
In [77]: import pandas.io.data as web
In [78]: import datetime as dt
(If youre working in the IPython notebook, the last two lines can probably be omitted)
The resulting figure looks as follows
150
1.11. PANDAS
Data from the World Bank Lets look at one more example of downloading and manipulating
data this time from the World Bank
The World Bank collects and organizes data on a huge range of indicators
For example, here we find data on government debt as
http://data.worldbank.org/indicator/GC.DOD.TOTL.GD.ZS/countries
ratio
to
GDP:
If you click on DOWNLOAD DATA you will be given the option to download the data as an
Excel file
The next program does this for you, parses the data from Excel file to pandas DataFrame, and
plots time series for France, Germany, the US and Australia
NOTE: This is not dually compatible with Python 3.
3 call the urllib package differently.
"""
import sys
import matplotlib.pyplot as plt
from pandas.io.excel import ExcelFile
if sys.version_info[0] == 2:
from urllib import urlretrieve
elif sys.version_info[0] == 3:
from urllib.request import urlretrieve
# == Get data and read into file gd.xls == #
wb_data_file_dir = "http://api.worldbank.org/datafiles/"
file_name = "GC.DOD.TOTL.GD.ZS_Indicator_MetaData_en_EXCEL.xls"
url = wb_data_file_dir + file_name
urlretrieve(url, "gd.xls")
# == Parse data into a DataFrame == #
gov_debt_xls = ExcelFile('gd.xls')
govt_debt = gov_debt_xls.parse('Sheet1', index_col=1, na_values=['NA'])
# == Take desired values and plot == #
govt_debt = govt_debt.transpose()
govt_debt = govt_debt[['AUS', 'DEU', 'FRA', 'USA']]
govt_debt = govt_debt[36:]
govt_debt.plot(lw=2)
plt.show()
1.11. PANDAS
151
Exercises
Exercise 1 Write a program to calculate the percentage price change since the start of the year for
the following shares
ticker_list = {'INTC': 'Intel',
'MSFT': 'Microsoft',
'IBM': 'IBM',
'BHP': 'BHP',
'RSH': 'RadioShack',
'TM': 'Toyota',
'AAPL': 'Apple',
'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}
Solution notebook
T HOMAS S ARGENT AND J OHN S TACHURSKI
152
As you know by now, IPython is not really a scientific library its an enhanced Python command
interface oriented towards scientific workflow
Weve already discussed the IPython notebook and shell, starting in this lecture
Here we briefly review some more of IPythons features
We will work in the IPython shell, but almost all of the following applies to the notebook too
153
Line Magics As you know by now, any Python command can be typed into an IPython shell
In [1]: 'foo' * 2
Out[1]: 'foofoo'
A program foo.py in the current working directory can be executed using run
In [2]: run foo.py
Note that p1 uses pure Python, whereas p2 uses NumPy arrays and should run faster
Heres how we can test this
In [1]: run temp.py
In [2]: p1(10, (1, 2))
Out[2]: 21
# Ditto
154
For p1, average execution time was 1.15 milliseconds, while for p2 it was about 10 microseconds
(i.e., millionths of a second) two orders of magnitude faster
Reloading Modules Here is one very common Python gotcha and a nice solution provided by
IPython
When we work with multiple files, changes in one file are not always visible in our program
To see this, suppose that you are working with files useful_functions.py and main_program.py
As the names suggest, the main program resides in main_program.py but imports functions from
useful_functions.py
You might have noticed that if you make a change to useful_functions.py and then re-run
main_program.py, the effect of that change isnt always apparent
Heres an example useful_functions.py in the current directory
## Filename: useful_functions.py
def meaning_of_life():
"Computes the meaning of life"
return 42
155
The reason is that useful_functions.py has been compiled to a byte code file, in preparation for
sending its instructions to the Python virtual machine
The byte code file will be called useful_functions.pyc, and live in the same directory as
useful_functions.py
Even though weve modified useful_functions.py,
useful_functions.pyc
The nicest way to get your dependencies to recompile is to use IPythons autoreload extension
In [3]: %load_ext autoreload
In [4]: autoreload 2
In [5]: run main_program.py
The meaning of life is: 43
If you want this behavior to load automatically when you start IPython, add these lines to your
ipython_config.py file
c.InteractiveShellApp.extensions = ['autoreload']
c.InteractiveShellApp.exec_lines = ['%autoreload 2']
Are you one of those programmers who fills their code with print statements when trying to
debug their programs?
Hey, its OK, we all used to do that
But today might be a good day to turn a new page, and start using a debugger
Debugging is a big topic, but its actually very easy to learn the basics
156
This code is intended to plot the log function over the interval [1, 2]
But theres an error here: plt.subplots(2, 1) should be just plt.subplots()
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suitable for
having two subplots on the same figure)
Heres what happens when we run the code
In [1]: run temp.py
--------------------------------------------------------------------------AttributeError
Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname,*where)
176
else:
177
filename = fname
--> 178
__builtin__.execfile(filename,*where)
/home/john/temp/temp.py in <module>()
8
plt.show()
9
---> 10 plot_log()
/home/john/temp/temp.py in plot_log()
5
fig, ax = plt.subplots(2, 1)
6
x = np.linspace(1, 2, 10)
----> 7
ax.plot(x, np.log(x))
8
plt.show()
9
AttributeError: 'numpy.ndarray' object has no attribute 'plot'
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x))
157
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array has
no plot method
But lets pretend that we dont understand this for the moment
We might suspect theres something wrong with ax, but when we try to investigate this object
In [2]: ax
--------------------------------------------------------------------------NameError
Traceback (most recent call last)
<ipython-input-2-645aedc8a285> in <module>()
----> 1 ax
NameError: name 'ax' is not defined
The problem is that ax was defined inside plot_log(), and the name is lost once that function
terminates
Lets try doing it a different way
First we run temp.py again, but this time we respond to the exception by typing debug
This will cause us to be dropped into the Python debugger at the point of execution just before the
exception occurs
In [1]: run temp.py
--------------------------------------------------------------------------AttributeError
Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname,*where)
176
else:
177
filename = fname
--> 178
__builtin__.execfile(filename,*where)
/home/john/temp/temp.py in <module>()
8
plt.show()
9
---> 10 plot_log()
/home/john/temp/temp.py in plot_log()
5
fig, ax = plt.subplots(2, 1)
6
x = np.linspace(1, 2, 10)
----> 7
ax.plot(x, np.log(x))
8
plt.show()
9
AttributeError: 'numpy.ndarray' object has no attribute 'plot'
In [2]: debug
> /home/john/temp/temp.py(7)plot_log()
6
x = np.linspace(1, 2, 10)
----> 7
ax.plot(x, np.log(x))
8
plt.show()
ipdb>
158
Were now at the ipdb> prompt, at which we can investigate the value of our variables at this point
in the program, step forward through the code, etc.
For example, here we simply type the name ax to see whats happening with this object
ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)
Its now very clear that ax is an array, which clarifies the source of the problem
To find out what else you can do from inside ipdb (or pdb), use the on line help
ipdb> h
Documented commands (type help <topic>):
========================================
EOF
bt
cont
enable jump
a
c
continue exit
l
alias cl
d
h
list
args
clear
debug
help
n
b
commands
disable
ignore next
break condition down
j
p
pdef
pdoc
pinfo
pp
q
quit
r
restart
return
run
s
step
tbreak
u
unalias
unt
until
up
w
whatis
where
Setting a Break Point The preceding approach is handy but sometimes insufficent
For example, consider the following modified version of temp.py
import numpy as np
import matplotlib.pyplot as plt
def plot_log():
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Here the original problem is fixed, by weve accidentally written np.logspace(1, 2, 10) instead
of np.linspace(1, 2, 10)
159
Now there wont be any exception, but the plot will not look right
To use the debugger to investigate, we can add a break point, by inserting the line import ipdb;
ipdb.set_trace() in a suitable location
import numpy as np
import matplotlib.pyplot as plt
def plot_log():
import ipdb; ipdb.set_trace()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Now lets run the script, and investigate via the debugger
In [3]: run temp.py
> /home/john/temp/temp.py(6)plot_log()
5
import ipdb; ipdb.set_trace()
----> 6
fig, ax = plt.subplots()
7
x = np.logspace(1, 2, 10)
ipdb> n
> /home/john/temp/temp.py(7)plot_log()
6
fig, ax = plt.subplots()
----> 7
x = np.logspace(1, 2, 10)
8
ax.plot(x, np.log(x))
ipdb> n
> /home/john/temp/temp.py(8)plot_log()
7
x = np.logspace(1, 2, 10)
----> 8
ax.plot(x, np.log(x))
9
plt.show()
ipdb> x
array([
10.
,
27.82559402,
77.42636827,
12.91549665,
35.93813664,
100.
])
16.68100537,
46.41588834,
21.5443469 ,
59.94842503,
Here we used n twice to step forward through the code (one line at a time), and then printed the
value of x to see what was happening with that variable
Python in the Cloud
160
One thing that comes in handy here is that if you want to issue terminal commands such as
git clone https://github.com/QuantEcon/QuantEcon.py
you can do it from a notebook cell as long as you put a ! in front of the command
For example
!git clone https://github.com/QuantEcon/QuantEcon.py
If this works, you should now have the main repository sitting in your pwd, and you can cd into
it and get programming in the same manner described above
The big difference is that your programs are now running on Amazons massive web service
infrastructure!
Overview
161
2. For those lines of code that are time-critical, we can achieve C-like speeds with minor modifications
This lecture will walk you through some of the most popular options for implementing this last
step
(A number of other useful options are mentioned below)
Note: In what follows we often ask you to execute code in an IPython notebook cell. Such code
will not run outside the notebook without modifications. This is because we take advantage of
some IPython line and cell magics
Lets start by trying to understand why high level languages like Python are slower than compiled
code
Dynamic Typing Consider this Python operation
In [1]: a, b = 10, 10
In [2]: a + b
Out[2]: 20
Even for this simple operation, the Python interpreter has a fair bit of work to do
For example, in the statement a + b, the interpreter has to know which operation to invoke
If a and b are strings, then a + b requires string concatenation
In [3]: a, b = 'foo', 'bar'
In [4]: a + b
Out[4]: 'foobar'
(We say that the operator + is overloaded its action depends on the type of the objects on which
it acts)
As a result, Python must check the type of the objects and then call the correct operation
This involves substantial overheads
Static Types Compiled languages avoid these overheads with explicit, static types
162
For example, consider the following C code, which sums the integers from 1 to 10
# include <stdio.h>
int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}
163
Now try
%%timeit
n = 100000
sum = 0
for i in range(n):
x = random.uniform(0, 1)
sum += x**2
Followed by
%%timeit
n = 100000
x = np.random.uniform(0, 1, n)
np.sum(x**2)
You should find that the second code block which achieves the same thing as the first runs
one to two orders of magnitude faster
The reason is that in the second implementation we have broken the loop down into three basic
operations
1. draw n uniforms
2. square them
3. sum them
These are sent as batch operators to optimized machine code
Apart from minor overheads associated with sending data back and forth, the result is C- or
Fortran-like speed
When we run batch operations on arrays like this, we say that the code is vectorized
Although there are exceptions, vectorized code is typically fast and efficient
It is also surprisingly flexible, in the sense that many operations can be vectorized
The next section illustrates this point
164
Universal Functions Many functions provided by NumPy are so-called universal functions
also called ufuncs
This means that they
map scalars into scalars, as expected
map arrays into arrays, acting elementwise
For example, np.cos is a ufunc:
In [1]: import numpy as np
In [2]: np.cos(1.0)
Out[2]: 0.54030230586813977
In [3]: np.cos(np.linspace(0, 1, 3))
Out[3]: array([ 1., 0.87758256, 0.54030231])
cos( x2 + y2 )
1 + x 2 + y2
and
a=3
Heres a plot of f
165
In the vectorized version all the looping takes place in compiled code
If you add %%timeit to the top of these code snippets and run them in a notebook cell, youll see
that the second version is much faster about two orders of magnitude
Pros and Cons of Vectorization At its best, vectorization yields fast, simple code
However, its not without disadvantages
One issue is that it can be highly memory intensive
For example, the vectorized maximization routine above is far more memory intensive than the
non-vectorized version that preceded it
Another issue is that not all algorithms can be vectorized
In these kinds of settings, we need to go back to loops
Fortunately, there are very nice ways to speed up Python loops
Numba
One of the most exciting developments in recent years in terms of scientific Python is Numba
166
Numba aims to automatically compile functions to native machine code instructions on the fly
The process isnt flawless, since Numba needs to infer type information on all variables to generate
pure machine instructions
Such inference isnt possible in every setting
But for simple routines Numba infers types very well
Moreover, the hot loops at the heart of our code that we need to speed up are often such simple
routines
Prerequisites If you followed our set up instructions and installed Anaconda, then youll be ready
to use Numba
If not, try import numba
If you get no complaints then you should be good to go
If you do experience problems here or below then consider installing Anaconda
If you do have Anaconda installed, now might be a good time to run conda update numba from a
system terminal
An Example Lets consider some problems that are difficult to vectorize
One is generating the trajectory of a difference equation given an initial condition
Lets take the difference equation to be the quadratic map
xt+1 = 4xt (1 xt )
Heres the plot of a typical trajectory, starting from x0 = 0.1, with t on the x-axis
167
Now heres a function to generate a trajectory of a given length from a given initial condition
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x
Here the function body is identical to qm the name has been changed only to aid speed comparisons
Timing the function calls qm(0.1, 100000) and qm_numba(0.1, 100000) gives us a speed-up factor in the order of 400 times
Your mileage may vary depending on your hardware and version of Numba, but anything in this
neighborhood is remarkable given how trivial the implementation is
How and When it Works Numba attempts to generate fast machine code using the infrastructure provided by the LLVM Project
It does this by inferring type information on the fly
As you can imagine, this is easier for simple Python objects (simple scalar data types, such as
floats, integers, etc.)
Numba also plays well with NumPy arrays, which it treats as typed memory regions
In an ideal setting, Numba can infer all necessary type information
This allows it to generate native machine code, without having to call the Python runtime environment
In such a setting, Numba will be on par with machine code from low level languages
When Numba cannot infer all type information, some Python objects are given generic object
status, and some code is generated using the Python runtime
T HOMAS S ARGENT AND J OHN S TACHURSKI
168
In this second setting, Numba typically provides only minor speed gains or none at all
Hence its prudent when using Numba to focus on speeding up small, time-critical snippets of
code
This will give you much better performance than blanketing your Python programs with @jit
statements
Cython
Like Numba, Cython provides an approach to generating fast compiled code that can be used
from Python
As was the case with Numba, a key problem is the fact that Python is dynamically typed
As youll recall, Numba solves this problem (where possible) by inferring type
Cythons approach is different programmers add type definitions directly to their Python
code
As such, the Cython language can be thought of as Python with type definitions
In addition to a language specification, Cython is also a language translator, transforming Cython
code into optimized C and C++ code
Cython also takes care of building language extentions the wrapper code that interfaces between the resulting compiled code and Python
As well see, Cython is particularly easy to use from within the IPython notebook
A First Example Lets start with a rather artificial example
Suppose that we want to compute the sum in=0 i for given , n
Suppose further that weve forgotten the basic formula
n
i =
i =0
1 n +1
1
169
If youre not familiar with C, the main thing you should take notice of is the type definitions
int means integer
double means double precision floating point number
the double in double geo_prog(... indicates that the function will return a double
Not surprisingly, the C code is faster than the Python code
A Cython Implementation Cython implementations look like a convex combination of Python
and C
Were going to run our Cython code in the IPython notebook, so well start by loading the Cython
extension in a notebook cell
%load_ext cythonmagic
Here cdef is a Cython keyword indicating a variable declaration, and is followed by a type
The %%cython line at the top is not actually Cython code its an IPython cell magic indicating
the start of Cython code
After executing the cell, you can now call the function geo_prog_cython from within Python
What you are in fact calling is compiled C code that runs at about the same speed as our handcoded C routine above
170
Example 2: Cython with NumPy Arrays Lets go back to the first problem that we worked with:
generating the iterates of the quadratic map
xt+1 = 4xt (1 xt )
The problem of computing iterates and returning a time series requires us to work with arrays
The natural array type to work with is NumPy arrays
Heres a Cython implemention that initializes, populates and returns a NumPy array
%%cython
import numpy as np
def qm_cython_first_pass(double x0, int n):
cdef int t
x = np.zeros(n+1, float)
x[0] = x0
for t in range(n):
x[t+1] = 4.0 * x[t] * (1 - x[t])
return np.asarray(x)
If you run this code and time it, you will see that its performance is disappointing nothing like
the speed gain we got from Numba
See qm_numba above
The reason is that working with NumPy arrays still incurs substantial Python overheads
We can do better by using Cythons typed memoryviews, which provide more direct access to
arrays in memory
When using them, the first step is to create a NumPy array
Next, we declare a memoryview and bind it to the NumPy array
Heres an example:
%%cython
import numpy as np
from numpy cimport float_t
def qm_cython(double x0, int n):
cdef int t
x_np_array = np.zeros(n+1, dtype=float)
cdef float_t [:] x = x_np_array
x[0] = x0
for t in range(n):
x[t+1] = 4.0 * x[t] * (1 - x[t])
return np.asarray(x)
Here
cimport pulls in some compile-time information from NumPy
cdef float_t [:]
x_np_array
171
the return statement uses np.asarray(x) to convert the memoryview back to a NumPy array
On our hardware, the Cython implementation qm_cython runs at about the same speed as
qm_numba
Summary Cython requires more expertise than Numba, and is a little more fiddly in terms of
getting good performance
In fact its surprising how difficult it is to beat the speed improvements provided by Numba
Nonetheless,
Cython is a very mature, stable and widely used tool
Cython can be more useful than Numba when working with larger, more sophisticated applications
Other Options
There are in fact many other approaches to speeding up your Python code
We mention only a few of the most popular methods
Interfacing with Fortran If you are comfortable writing Fortran you will find it very easy to
create extention modules from Fortran code using F2Py
F2Py is a Fortran-to-Python interface generator that is particularly simple to use
Robert Johansson provides a very nice introduction to F2Py, among other things
Recently, an IPython cell magic for Fortran has been developed you might want to give it a try
Parallel and Cloud Computing This is a big topic that we wont address in detail yet
However, you might find the following links a useful starting point
IPython for parallel computing
NumbaPro
The Starcluster interface to Amazons EC2
Anaconda Accelerate
Exercises
Exercise 1 Later well learn all about finite state Markov chains
For now lets just concentrate on simulating a very simple example of such a chain
Suppose that the volatility of returns on an asset can be in one of two regimes high or low
The transition probabilities across states are as follows
T HOMAS S ARGENT AND J OHN S TACHURSKI
172
For example, let the period length be one month, and suppose the current state is high
We see from the graph that the state next month will be
high with probability 0.8
low with probability 0.2
Your task is to simulate a sequence of monthly volatility states according to this rule
Set the length of the sequence to n = 100000 and start in the high state
Implement a pure Python version, a Numba version and a Cython version, and compare speeds
To test your code, evaluate the fraction of time that the chain spends in the low state
If your code is correct, it should be about 2/3
Solutions
Solution notebook
Appendix Other Options There are other important projects aimed at speeding up Python
These include but are not limited to
Pythran : A Python to C++ compiler
Parakeet : A runtime compiler aimed at scientific computing in Python
PyPy : Runtime environment using just-in-time compiler
Nuitka : Another Python compiler
Pyston : Under development, sponsored by Dropbox
CHAPTER
TWO
INTRODUCTORY APPLICATIONS
This section of the course contains relatively simple applications, one purpose of which is to teach
you more about the Python programming environment
Overview
One of the single most useful branches of mathematics you can learn is linear algebra
For example, many applied problems in economics, finance, operations research and other fields
of science require the solution of a linear system of equations, such as
y1 = ax1 + bx2
y2 = cx1 + dx2
or, more generally,
y1 = a11 x1 + a12 x2 + + a1k xk
..
.
(2.1)
174
A vector of length n is just a sequence (or array, or tuple) of n numbers, which we write as x =
( x1 , . . . , xn ) or x = [ x1 , . . . , xn ]
We will write these sequences either horizontally or vertically as we please
(Later, when we wish to perform certain matrix operations, it will become necessary to distinguish
between the two)
The set of all n-vectors is denoted by Rn
For example, R2 is the plane, and a vector in R2 is just a point in the plane
Traditionally, vectors are represented visually as arrows from the origin to the point
The following figure represents three vectors in this manner
If youre interested, the Python code for producing this figure is here
Vector Operations The two most common operators for vectors are addition and scalar multiplication, which we now describe
As a matter of definition, when we add two vectors, we add them element by element
x1
y1
x1 + y1
x2 y2
x2 + y2
x + y = . + . :=
..
.. ..
.
xn
yn
xn + yn
Scalar multiplication is an operation that takes a number and a vector x and produces
x1
x2
x := .
..
xn
T HOMAS S ARGENT AND J OHN S TACHURSKI
175
In [4]: x + y
Out[4]: array([ 3.,
5.,
7.])
In [5]: 4 * x
Out[5]: array([ 4.,
4.,
4.])
xi yi
i =1
2
k x k : = x 0 x : = xi
i =1
176
In [7]: np.sqrt(np.sum(x**2))
Out[7]: 1.7320508075688772
In [8]: np.linalg.norm(x)
Out[8]: 1.7320508075688772
Span Given a set of vectors A := { a1 , . . . , ak } in Rn , its natural to think about the new vectors
we can create by performing linear operations
New vectors created in this manner are called linear combinations of A
In particular, y Rn is a linear combination of A := { a1 , . . . , ak } if
y = 1 a1 + + k ak for some scalars 1 , . . . , k
In this context, the values 1 , . . . , k are called the coefficients of the linear combination
The set of linear combinations of A is called the span of A
The next figure shows the span of A = { a1 , a2 } in R3
The span is a 2 dimensional plane passing through these two points and the origin
The code for producing this figure can be found here
T HOMAS S ARGENT AND J OHN S TACHURSKI
177
Examples If A contains only one vector a1 R2 , then its span is just the scalar multiples of a1 ,
which is the unique line passing through both a1 and the origin
If A = {e1 , e2 , e3 } consists of the canonical basis vectors of R3 , that is
0
0
1
e1 : = 0 , e2 : = 1 , e3 : = 0
1
0
0
then the span of A is all of R3 , because, for any x = ( x1 , x2 , x3 ) R3 , we can write
x = x 1 e1 + x 2 e2 + x 3 e3
Now consider A0 = {e1 , e2 , e1 + e2 }
If y = (y1 , y2 , y3 ) is any linear combination of these vectors, then y3 = 0 (check it)
Hence A0 fails to span all of R3
Linear Independence As well see, its often desirable to find families of vectors with relatively
large span, so that many vectors can be described by linear operators on a few vectors
The condition we need for a set of vectors to have a large span is whats called linear independence
In particular, a collection of vectors A := { a1 , . . . , ak } in Rn is said to be
linearly dependent if some strict subset of A has the same span as A
linearly independent if it is not linearly dependent
Put differently, a set of vectors is linearly independent if no vector is redundant to the span, and
linearly dependent otherwise
T HOMAS S ARGENT AND J OHN S TACHURSKI
178
To illustrate the idea, recall the figure that showed the span of vectors { a1 , a2 } in
through the origin
R3 as a plane
If we take a third vector a3 and form the set { a1 , a2 , a3 }, this set will be
linearly dependent if a3 lies in the plane
linearly independent otherwise
As another illustration of the concept, since Rn can be spanned by n vectors (see the discussion of
canonical basis vectors above), any collection of m > n vectors in Rn must be linearly dependent
The following statements are equivalent to linear independence of A := { a1 , . . . , ak } Rn
1. No vector in A can be formed as a linear combination of the other elements
2. If 1 a1 + k ak = 0 for scalars 1 , . . . , k , then 1 = = k = 0
Unique Representations Another nice thing about sets of linearly independent vectors is that
each element in the span has a unique representation as a linear combination of these vectors
In other words, if A := { a1 , . . . , ak } Rn is linearly independent and
y = 1 a1 + k a k
then no other coefficient sequence 1 , . . . , k will produce the same vector y
Indeed, if we also have y = 1 a1 + k ak , then
( 1 1 ) a1 + + ( k k ) ak = 0
Linear independence now implies i = i for all i
Matrices
Matrices are a neat way of organizing data for use in linear operations
An n k matrix is a rectangular array A of numbers with n rows and k columns:
A= .
..
..
.
.
.
.
an1 an2 ank
Often, the numbers in the matrix represent coefficients in a system of linear equations, as discussed
at the start of this lecture
For obvious reasons, the matrix A is also called a vector if either n = 1 or k = 1
In the former case, A is called a row vector, while in the latter it is called a column vector
If n = k, then A is called square
T HOMAS S ARGENT AND J OHN S TACHURSKI
179
The matrix formed by replacing aij by a ji for every i and j is called the transpose of A, and denoted
A0 or A>
If A = A0 , then A is called symmetric
For a square matrix A, the i elements of the form aii for i = 1, . . . , n are called the principal diagonal
A is called diagonal if the only nonzero entries are on the principal diagonal
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then A is
called the identity matrix, and denoted by I
Matrix Operations Just as was the case for vectors, a number of algebraic operations are defined
for matrices
Scalar multiplication and addition are immediate generalizations of the vector case:
a11 a1k
a11 a1k
..
.. := ..
..
..
A = ...
.
.
.
.
.
an1 ank
an1 ank
and
a11 a1k
..
.. +
A + B = ...
.
.
an1 ank
b11
..
.
bn1
b1k
a11 + b11 a1k + b1k
..
.. :=
..
..
..
.
.
.
.
.
bnk
an1 + bn1 ank + bnk
In the latter case, the matrices must have the same shape in order for the definition to make sense
We also have a convention for multiplying two matrices
The rule for matrix multiplication generalizes the idea of inner products discussed above, and is
designed to make multiplication play well with basic linear operations
If A and B are two matrices, then their product AB is formed by taking as its i, j-th element the
inner product of the i-th row of A and the j-th column of B
There are many tutorials to help you visualize this operation, such as this one, or the discussion
on the Wikipedia page
If A is n k and B is j m, then to multiply A and B we require k = j, and the resulting matrix
AB is n m
As perhaps the most important special case, consider multiplying n k matrix A and k 1 column
vector x
According to the preceding rule, this gives us an n 1 column vector
a11 a1k
x1
a11 x1 + + a1k xk
..
.. .. :=
..
Ax = ...
.
. .
.
an1 ank
xk
an1 x1 + + ank xk
(2.2)
180
The shape attribute is a tuple giving the number of rows and columns see here for more discussion
To get the transpose of A, use A.transpose() or, more simply, A.T
There are many convenient functions for creating common matrices (matrices of zeros, ones, etc.)
see here
Since operations are performed elementwise by default, scalar multiplication and addition have
very natural syntax
In [8]: A = np.identity(3)
In [9]: B = np.ones((3, 3))
In [10]:
Out[10]:
array([[
[
[
2 * A
2.,
0.,
0.,
0.,
2.,
0.,
0.],
0.],
2.]])
In [11]: A + B
1 Although there is a specialized matrix data type defined in NumPy, its more standard to work with ordinary
NumPy arrays. See this discussion.
181
1.,
2.,
1.,
1.],
1.],
2.]])
You can check that this holds for the function f ( x ) = Ax + b when b is the zero vector, and fails
when b is nonzero
In fact, its known that f is linear if and only if there exists a matrix A such that f ( x ) = Ax for all
x.
Solving Systems of Equations
(2.3)
The problem we face is to determine a vector x Rk that solves (2.3), taking y and A as given
This is a special case of a more general problem: Find an x such that y = f ( x )
Given an arbitrary function f and a y, is there always an x such that y = f ( x )?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
In the first plot there are multiple solutions, as the function is not one-to-one, while in the second
there are no solutions, since y lies outside the range of f
Can we impose conditions on A in (2.3) that rule out these problems?
In this context, the most important thing to recognize about the expression Ax is that it corresponds to a linear combination of the columns of A
182
183
184
Perhaps the most important fact about determinants is that A is nonsingular if and only if A is of
full column rank
This gives us a useful one-number summary of whether or not a square matrix can be inverted
More Rows than Columns This is the n k case with n > k
This case is very important in many settings, not least in the setting of linear regression (where n
is the number of observations, and k is the number of explanatory variables)
Given arbitrary y Rn , we seek an x Rk such that y = Ax
In this setting, existence of a solution is highly unlikely
Without much loss of generality, lets go over the intuition focusing on the case where the columns
of A are linearly independent
It follows that the span of the columns of A is a k-dimensional subspace of Rn
This set can never be linearly independent, since 2 vectors are enough to span R2
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two
For example, lets say that a1 = a2 + a3
185
# Column vector
In [16]: A_inv
Out[16]:
array([[-2. , 1. ],
[ 1.5, -0.5]])
In [17]: x = np.dot(A_inv, y)
In [18]: np.dot(A, x)
Out[18]:
array([[ 1.],
[ 1.]])
In [19]: solve(A, y)
Out[19]:
array([[-1.],
[ 1.]])
# Solution
# Should equal y
Observe how we can solve for x = A1 y by either via np.dot(inv(A), y), or using solve(A, y)
The latter method uses a different algorithm (LU decomposition) that is numerically more stable,
and hence should almost always be preferred
To obtain the least squares solution x = ( A0 A)1 A0 y, use scipy.linalg.lstsq(A, y)
Eigenvalues and Eigenvectors
186
The eigenvalue equation is equivalent to ( A I )v = 0, and this has a nonzero solution v only
when the columns of A I are linearly dependent
This in turn is equivalent to stating that the determinant is zero
Hence to find all eigenvalues, we can look for such that the determinant of A I is zero
This problem can be expressed as one of solving for the roots of a polynomial in of degree n
This in turn implies the existence of n solutions in the complex plane, although some might be
repeated
Some nice facts about the eigenvalues of a square matrix A are as follows
1. The determinant of A equals the product of the eigenvalues
2. The trace of A (the sum of the elements on the principal diagonal) equals the sum of the
eigenvalues
3. If A is symmetric, then all of its eigenvalues are real
187
We round out our discussion by briefly mentioning several other important topics
Series Expansions Recall the usual summation formula for a geometric progression, which
k
1
states that if | a| < 1, then
k =0 a = ( 1 a )
A generalization of this idea exists in the matrix setting
188
k Ak := max k Ax k
k x k=1
The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand side
is a matrix norm in this case, the so-called spectral norm
For example, for a square matrix S, the condition kSk < 1 means that S is contractive, in the sense
that it pulls all vectors towards the origin 2
Neumanns Theorem Let A be a square matrix and let Ak := AAk1 with A1 := A
In other words, Ak is the k-th power of A
Neumanns theorem states the following: If k Ak k < 1 for some k
and
( I A ) 1 =
N, then I A is invertible,
Ak
(2.4)
k =0
Spectral Radius A result known as Gelfands formula tells us that, for any square matrix A,
( A) = lim k Ak k1/k
k
Here ( A) is the spectral radius, defined as maxi |i |, where {i }i is the set of eigenvalues of A
As a consequence of Gelfands formula, if all eigenvalues are strictly less than one in modulus,
there exists a k with k Ak k < 1
In which case (2.4) is valid
Positive Definite Matrices Let A be a symmetric n n matrix
We say that A is
1. positive definite if x 0 Ax > 0 for every x Rn \ {0}
Suppose that kSk < 1. Take any nonzero vector x, and let r := k x k. We have kSx k = r kS( x/r )k r kSk < r = k x k.
Hence every point is pulled towards the origin.
189
a0 x
x
=a
2.
Ax
x
= A0
3.
x 0 Ax
x
= ( A + A0 ) x
4.
y0 Bz
y
= Bz
5.
y0 Bz
B
= yz0
L = y0 Py u0 Qu + 0 [ Ax + Bu y]
where is an n 1 vector of Lagrange multipliers
Try applying the above formulas for differentiating quadratic and linear forms to obtain the firstorder conditions for maximizing L with respect to y, u and minimizing it with respect to
Show that these conditions imply that
1. = 2Py
2. The optimizing choice of u satisfies u = ( Q + B0 PB)1 B0 PAx
where P = A0 PA A0 PB( Q + B0 PB)1 B0 PA
3. The function v satisfies v( x ) = x 0 Px
As we will see, in economic contexts Lagrange multipliers often are shadow prices
Note: If we dont care about the Lagrange multipliers, we can subsitute the constraint into the
objective function, and then just maximize ( Ax + Bu)0 P( Ax + Bu) u0 Qu with respect to u. You
can verify that this leads to the same maximizer.
190
Further Reading The documentation of the scipy.linalg submodule can be found here
Chapter 2 of these notes contains a discussion of linear algebra along the same lines as above,
with solved exercises
If you dont mind a slightly abstract approach, a nice intermediate-level read on linear algebra is
[Janich94]
Overview
Markov chains are one of the most useful classes of stochastic processes
Attributes:
simple, flexible and supported by many elegant theoretical results
valuable for building intuition about random dynamic models
very useful in their own right
You will find them in many of the workhorse models of economics and finance
In this lecture we review some of the theory of Markov chains, with a focus on numerical methods
Prerequisite knowledge is basic probability and linear algebra
Definitions
191
Stochastic Matrices A stochastic matrix (or Markov matrix) is an n n square matrix P = P[i, j]
such that
1. each element P[i, j] is nonnegative, and
2. each row P[i, ] sums to one
Let S := {0, . . . , n 1}
Evidently, each row P[i, ] can be regarded as a distribution (probability mass function) on S
It is not difficult to check 3 that if P is a stochastic matrix, then so is the k-th power Pk for all k N
Markov Chains A stochastic matrix describes the dynamics of a Markov chain { Xt } that takes
values in the state space S
Formally, we say that a discrete time stochastic process { Xt } taking values in S is a Markov chain
with stochastic matrix P if
P{ Xt+1 = j | Xt = i } = P[i, j]
for any t 0 and i, j S; here P means probability
Remark: This definition implies that { Xt } has the Markov property, which is to say that, for any t,
P { X t +1 | X t } = P { X t +1 | X t , X t 1 , . . . }
Thus the state Xt is a complete description of the current position of the system
Thus, by construction,
P[i, j] is the probability of going from i to j in one unit of time (one step)
P[i, ] is the conditional distribution of Xt+1 given Xt = i
Another way to think about this process is to imagine that, when Xt = i, the next value Xt+1 is
drawn from the i-th row P[i, ]
Rephrasing this using more algorithmic language
At each t, the new state Xt+1 is drawn from P[ Xt , ]
Example 1 Consider a worker who, at any given time t, is either unemployed (state 0) or employed (state 1)
Lets write this mathematically as Xt = 0 or Xt = 1
Suppose that, over a one month period,
1. An employed worker loses her job and becomes unemployed with probability (0, 1)
2. An unemployed worker finds a job with probability (0, 1)
3 Hint: First show that if P and Q are stochastic matrices then so is their product to check the row sums, try
postmultiplying by a column vector of ones. Finally, argue that Pn is a stochastic matrix using induction.
192
P=
1
Once we have the values and , we can address a range of questions, such as
What is the average duration of unemployment?
Over the long-run, what fraction of time does a worker find herself unemployed?
Conditional on employment, what is the probability of becoming unemployed at least once
over the next 12 months?
Etc.
Well cover such applications below
Example 2 Using US unemployment data, Hamilton [Ham05] estimated the stochastic matrix
0.971 0.029
0
P := 0.145 0.778 0.077
0
0.508 0.492
where
the frequency is monthly
the first state represents normal growth
the second state represents mild recession
the third state represents severe recession
For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97
In general, large values on the main diagonal indicate persistence in the process { Xt }
This Markov process can also be represented as a directed graph, with edges labeled by transition
probabilities
193
One of the most natural ways to answer questions about Markov chains is to simulate them
(As usual, to approximate the probability of event E, we can simulate many times and count the
fraction of times that E occurs)
To simulate a Markov chain, we need its stochastic matrix P and a probability distribution for
the initial state
Here is a probability distribution on S with the interpretation that X0 is drawn from
The Markov chain is then constructed via the following two rules
1. At time t = 0, the initial state X0 is drawn from
2. At each subsequent time t, the new state Xt+1 is drawn from P[ Xt , ]
In order to implement this simulation procedure, we need a function for generating draws from a
given discrete distribution
We already have this functionality in handin the file discrete_rv.py
The module is part of the QuantEcon package, and defines a class DiscreteRV that can be used as
follows
In [64]: from quantecon import DiscreteRV
In [65]: psi = (0.1, 0.9)
In [66]: d = DiscreteRV(psi)
In [67]: d.draw(5)
Out[67]: array([0, 1, 1, 1, 1])
Here
psi is understood to be a discrete distribution on the set of outcomes 0, ..., len(psi) 1
d.draw(5) generates 5 independent draws from this distribution
Lets now write a function that generates time series from a specified pair P,
Our function will take the following three arguments
A stochastic matrix P,
An initial state or distribution init
A positive integer sample_size representing the length of the time series the function should
return
Lets allow init to either be
an integer in 0, . . . , n 1 providing a fixed starting value for X0 , or
a discrete distribution on this same set that corresponds to the initial distribution
T HOMAS S ARGENT AND J OHN S TACHURSKI
194
In the latter case, a random starting value for X0 is drawn from the distribution init
The function should return a time series (sample path) of length sample_size
One solution to this problem can be found in file mc_tools.py from the QuantEcon package
The relevant function is mc_sample_path
Lets see how it works using the small matrix
0.4 0.6
P :=
0.2 0.8
(2.5)
It happens to be true that, for a long series drawn from P, the fraction of the sample that takes
value 0 will be about 0.25 well see why later on
If you run the following code you should get roughly that answer
import numpy as np
from quantecon import mc_sample_path
P = np.array([[.4, .6], [.2, .8]])
s = mc_sample_path(P, init=(0.5, 0.5), sample_size=100000)
print((s == 0).mean()) # Should be about 0.25
Marginal Distributions
Suppose that
1. { Xt } is a Markov chain with stochastic matrix P
2. the distribution of Xt is known to be t
What then is the distribution of Xt+1 , or, more generally, of Xt+m ?
(Motivation for these questions is given below)
Solution Lets consider how to solve for the distribution t+m of Xt+m , beginning with the case
m=1
Throughout, t will refer to the distribution of Xt for all t
Hence our first aim is to find t+1 given t and P
To begin, pick any j S.
Using the law of total probability, we can decompose the probability that Xt+1 = j as follows:
P { X t +1 = j } =
P { X t +1 = j | X t = i } P { X t = i }
i S
(In words, to get the probability of being at j tomorrow, we account for all ways this can happen
and sum their probabilities)
Rewriting this statement in terms of marginal and conditional probabilities gives
t+1 [ j] =
i S
195
Xm 0 Pm
(2.6)
Xt+m t Pm
(2.7)
Note: Unless stated otherwise, we follow the common convention in the Markov chain literature
that distributions are row vectors
Example: Powers of a Markov Matrix We know that the probability of transitioning from i to j
in one step is P[i, j]
It turns out that that the probability of transitioning from i to j in m steps is Pm [i, j], the [i, j]-th
element of the m-th power of P
To see why, consider again (2.7), but now with t put all probability on state i
If we regard t as a vector, it is a vector with 1 in the i-th position and zero elsewhere
Inserting this into (2.7), we see that, conditional on Xt = i, the distribution of Xt+m is the i-th row
of Pm
In particular
P{ Xt+m = j} = Pm [i, j] = [i, j]-th element of Pm
Example: Future Probabilities Recall the stochastic matrix P for recession and growth considered
above
Suppose that the current state is unknown perhaps statistics are available only at the end of the
current month
We estimate the probability that the economy is in state i to be [i ]
The probability of being in recession (state 1 or state 2) in 6 months time is given by the inner
product
0
P6 1
1
T HOMAS S ARGENT AND J OHN S TACHURSKI
196
As stated in the previous section, we can shift probabilities forward one unit of time via postmultiplication by P
Some distributions are invariant under this updating process for example,
In [2]: P = np.array([[.4, .6], [.2, .8]])
197
For further details on uniqueness and uniform ergodicity, see, for example, EDTC, theorem 4.3.18
Example Recall our model of employment / unemployment dynamics for a given worker discussed above
Assuming (0, 1) and (0, 1), the uniform ergodicity condition is satisfied
Let = ( p, 1 p) be the stationary distribution, so that p corresponds to unemployment (state
0)
Using = P and a bit of algebra yields
p=
This is, in some sense, a steady state probability of unemployment more on interpretation below
Not surprisingly it tends to zero as 0, and to one as 0
Calculating Stationary Distribution As discussed above, a given Markov matrix P can have
many stationary distributions
That is, there can be many row vectors such that = P
In fact if P has two distinct stationary distributions 1 , 2 then it has infinitely many, since in this
case, as you can verify,
3 := 1 + (1 )2
is a stationary distribuiton for P for any [0, 1]
If we restrict attention to the case where only one stationary distribution exists, one option for
finding it is to try to solve the linear system ( In P) = 0 for , where In is the n n identity
But the zero vector solves this equation
Hence we need to impose the restriction that the solution must be a probability distribution
One function that will do this for us and implement a suitable algorithm is mc_compute_stationary
from mc_tools.py
Lets test it using the matrix (2.5)
import numpy as np
from quantecon import mc_compute_stationary
P = np.array([[.4, .6], [.2, .8]])
print(mc_compute_stationary(P))
If you run this you should find that the unique stationary distribution is (0.25, 0.75)
Convergence to Stationarity Let P be a stochastic matrix such that the uniform ergodicity assumption is valid
We know that under this condition there is a unique stationary distribution
198
In fact, under the same condition, we have another important result: for any nonnegative row
vector summing to one (i.e., distribution),
Pt
as
(2.8)
In view of our preceding discussion, this states that the distribution of Xt converges to , regardless
of the distribution of X0
This adds considerable weight to our interpretation of as a stochastic steady state
For one of several well-known proofs, see EDTC, theorem 4.3.18
The convergence in (2.8) is illustrated in the next figure
Here
P is the stochastic matrix for recession and growth considered above
The highest red dot is an arbitrarily chosen initial probability distribution , represented as
a vector in R3
The other red dots are the distributions Pt for t = 1, 2, . . .
The black dot is
The code for the figure can be found in the file examples/mc_convergence_plot.py in the main
repository you might like to try experimenting with different initial conditions
Ergodicity
Under the very same condition of uniform ergodicity, yet another important result obtains: If
1. { Xt } is a Markov chain with stochastic matrix P
T HOMAS S ARGENT AND J OHN S TACHURSKI
199
as n
(2.9)
Here
1{ Xt = j} = 1 if Xt = j and zero otherwise
convergence is with probability one
the result does not depend on the distribution (or value) of X0
The result tells us that the fraction of time the chain spends at state j converges to [ j] as time
goes to infinity This gives us another way to interpret the stationary distribution provided that
the convergence result in (2.9) is valid
Technically, the convergence in (2.9) is a special case of a law of large numbers result for Markov
chains see EDTC, section 4.3.4 for details
Example Recall our cross-sectional interpretation of the employment / unemployment model
discussed above
Assume that (0, 1) and (0, 1), so the uniform ergodicity condition is satisfied
We saw that the stationary distribution is ( p, 1 p), where
p=
200
for k = 0, 1, 2, . . ., and
"
E y t + j | x t = ei
j
= [( I P)1 y ]i
j =0
( I P)1 = I + P + 2 P2 +
Premultiplication by ( I P)1 amounts to applying the resolvent operator
Exercises
P=
1
with (0, 1) and (0, 1), then, in the long-run, the fraction of time spent unemployed will be
p :=
201
Exercise 2 A topic of interest for economics and many other disciplines is ranking
Lets now consider one of the most practical and important ranking problems the rank assigned
to web pages by search engines
(Although the problem is motivated from outside of economics, there is in fact a deep connection
between search ranking systems and prices in certain competitive equilibria see [DLP13])
To understand the issue, consider the set of results returned by a query to a web search engine
For the user, it is desirable to
1. receive a large set of accurate matches
2. have the matches returned in order, where the order corresponds to some measure of importance
Ranking according to a measure of importance is the problem we now consider
The methodology developed to solve this problem by Google founders Larry Page and Sergey
Brin is known as PageRank
To illustrate the idea, consider the following diagram
Imagine that this is a miniature version of the WWW, with
each node representing a web page
each arrow representing the existence of a link from one page to another
Now lets think about which pages are likely to be important, in the sense of being valuable to a
search engine user
One possible criterion for importance of a page is the number of inbound links an indication of
popularity
202
By this measure, m and j are the most important pages, with 5 inbound links each
However, what if the pages linking to m, say, are not themselves important?
Thinking this way, it seems appropriate to weight the inbound nodes by relative importance
The PageRank algorithm does precisely this
A slightly simplified presentation that captures the basic idea is as follows
Letting j be (the integer index of) a typical page and r j be its ranking, we set
rj =
i L j
ri
`i
where
`i is the total number of outbound links from i
L j is the set of all pages i such that i has a link to j
This is a measure of the number of inbound links, weighted by their own ranking (and normalized
by 1/`i )
There is, however, another interpretation, and it brings us back to Markov chains
Let P be the matrix given by P[i, j] = 1{i j}/`i where 1{i j} = 1 if i has a link to j and zero
otherwise
The matrix P is a stochastic matrix provided that each page has at least one link
With this definition of P we have
rj =
i L j
ri
r
= 1{i j} i = P[i, j]ri
`i all
`
i
i
all i
203
# \w matches alphanumerics
When you solve for the ranking, you will find that the highest ranked node is in fact g, while the
lowest is a
Exercise 3 In numerical work it is sometimes convenient to replace a continuous model with a
discrete one
In particular, Markov chains are routinely generated as discrete approximations to AR(1) processes
of the form
yt+1 = yt + ut+1
Here ut is assumed to be iid and N (0, u2 )
The variance of the stationary probability distribution of {yt } is
y2 :=
u2
1 2
Tauchens method [Tau86] is the most common method for approximating this continuous state
process with a finite state Markov chain
T HOMAS S ARGENT AND J OHN S TACHURSKI
204
Solution notebook
205
Overview
The shortest path problem is a classic problem in mathematics and computer science with applications in
Economics (sequential decision making, analysis of social networks, etc.)
Operations research and transportation
Robotics and artificial intelligence
Telecommunication network design and routing
Etc., etc.
For us, the shortest path problem also provides a simple introduction to the logic of dynamic
programming, which is one of our key topics
Variations of the methods we discuss are used millions of times every day, in applications such as
Google Maps
Outline of the Problem
The shortest path problem is one of finding how to traverse a graph from one specified node to
another at minimum cost
Consider the following graph
206
Note that J ( G ) = 0
Intuitively, the best path can now be found as follows
Start at A
From node v, move to any node that solves
T HOMAS S ARGENT AND J OHN S TACHURSKI
207
min{c(v, w) + J (w)}
w Fv
(2.10)
where
Fv is the set of nodes that can be reached from v in one step
c(v, w) is the cost of traveling from v to w
Hence, if we know the function J, then finding the best path is almost trivial
But how to find J?
Some thought will convince you that, for every node v, the function J satisfies
J (v) = min{c(v, w) + J (w)}
w Fv
(2.11)
(2.12)
Exercise 1 Use the algorithm given above to find the optimal path (and its cost) for this graph
Here the line node0, node1 0.04, node8 11.11, node14 72.21 means that from node0 we can
go to
node1 at cost 0.04
node8 at cost 11.11
node14 at cost 72.21
and so on
According to our calculations, the optimal path and its cost are like this
Your code should replicate this result
T HOMAS S ARGENT AND J OHN S TACHURSKI
208
Solution notebook
Outline
In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation [Sch69]
His model studies the dynamics of racially mixed neighborhoods
Like much of Schellings work, the model shows how local interactions can lead to surprising
aggregate structure
In particular, it shows that relatively mild preference for neighbors of similar race can lead in
aggregate to the collapse of mixed neighborhoods, and high levels of segregation
In recognition of this and other research, Schelling was awarded the 2005 Nobel Prize in Economic
Sciences (joint with Robert Aumann)
In this lecture we (in fact you) will build and run a version of Schellings model
The Model
We will cover a variation of Schellings model that is easy to program and captures main idea
Set Up Suppose we have two types of people: Orange people and green people
For the purpose of this lecture, we will assume there are 250 of each type
These agents all live on a single unit square
The location of an agent is just a point ( x, y), where 0 < x, y < 1
209
Preferences We will say that an agent is happy if half or more of her 10 nearest neighbors are of
the same type
Here nearest is in terms of Euclidean distance
An agent who is not happy is called unhappy
An important point here is that agents are not averse to living in mixed areas
They are perfectly happy if half their neighbors are of the other color
Behavior Initially, agents are mixed together (integrated)
In particular, the initial location of each agent is an independent draw from a bivariate uniform
distribution on S = (0, 1)2
Now, cycling through the set of all agents, each agent is now given the chance to stay or move
We assume that each agent will stay put if they are happy and move if unhappy
The algorithm for moving is as follows
1. Draw a random location in S
2. If happy at new location, move there
3. Else, go to step 1
In this way, we cycle continuously through the agents, moving as required
We continue to cycle until no-one wishes to move
Results
Lets have a look at the results we got when we coded and ran this model
As discussed above, agents are initially mixed randomly together
But after several cycles they become segregated into distinct regions
In this instance, the program terminated after 4 cycles through the set of agents, indicating that all
agents had reached a state of happiness
What is striking about the pictures is how rapidly racial integration breaks down
This is despite the fact that people in the model dont actually mind living mixed with the other
type
Even with these preferences, the outcome is a high degree of segregation
Exercises
Rather than show you the program that generated these figures, well now ask you to write your
own version
210
211
212
You can see our program at the end, when you look at the solution
Exercise 1 Implement and run this simulation for yourself
Consider the following structure for your program
Agents are modeled as objects
(Have a look at this lecture if youve forgotten how to build your own objects)
Heres an indication of how they might look
* Data:
* type (green or orange)
* location
* Methods:
* Determine whether happy or not given locations of other agents
* If not happy, move
* find a new location where happy
Solution notebook
213
This lecture illustrates two of the most important theorems of probability and statistics: The law
of large numbers (LLN) and the central limit theorem (CLT)
These beautiful theorems lie behind many of the most fundamental results in econometrics and
quantitative economic modeling
The lecture is based around simulations that show the LLN and CLT in action
We also demonstrate how the LLN and CLT break down when the assumptions they are based on
do not hold
In addition, we examine several useful extensions of the classical theorems, such as
The delta method, for smooth functions of random variables
The multivariate case
Some of these extensions are presented as exercises
Relationships
We begin with the law of large numbers, which tells us when sample averages will converge to
their population means
The Classical LLN The classical law of large numbers concerns independent and identically
distributed (IID) random variables
Here is the strongest version of the classical LLN, known as Kolmogorovs strong law
Let X1 , . . . , Xn be independent and identically distributed scalar random variables, with common
distribution F
When it exists, let denote the common mean of this sample:
:= EX =
In addition, let
xF (dx )
1 n
X n := Xi
n i =1
214
(2.13)
(2.15)
=
=
"
1
n i
=1
n
( Xi )
#2
1
n2
i =1 j =1
1
n2
E ( Xi ) 2
E(Xi )(Xj )
n
i =1
2
=
n
Here the crucial step is at the third equality, which follows from independence
Independence means that if i 6= j, then the covariance term E( Xi )( X j ) drops out
As a result, n2 n terms vanish, leading us to a final expression that goes to zero in n
T HOMAS S ARGENT AND J OHN S TACHURSKI
215
(2.16)
216
217
Infinite Mean What happens if the condition E| X | < in the statement of the LLN is not
satisfied?
This might be the case if the underlying distribution is heavy tailed the best known example is
the Cauchy distribution, which has density
f (x) =
1
(1 + x 2 )
( x R)
The next figure shows 100 independent draws from this distribution
T HOMAS S ARGENT AND J OHN S TACHURSKI
218
Notice how extreme observations are far more prevalent here than the previous figure
Lets now have a look at the behavior of the sample mean
Here weve increased n to 1000, but the sequence still shows no sign of converging
Will convergence become visible if we take n even larger?
The answer is no
To see this, recall that the characteristic function of the Cauchy distribution is
(t) = EeitX =
(2.17)
219
Next we turn to the central limit theorem, which tells us about the distribution of the deviation
between sample averages and population means
Statement of the Theorem The central limit theorem is one of the most remarkable results in all
of mathematics
In the classical IID setting, it tells us the following: If the sequence X1 , . . . , Xn is IID, with common
mean and common variance 2 (0, ), then
d
n( X n ) N (0, 2 )
as
(2.18)
Here N (0, 2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal with
standard deviation
Intuition The striking implication of the CLT is that for any distribution with finite second moment, the simple operation of adding independent copies always leads to a Gaussian curve
A relatively simple proof of the central limit theorem can be obtained by working with characteristic functions (see, e.g., theorem 9.5.6 of [Dud02])
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition
In fact all of the proofs of the CLT that we know are similar in this respect
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating addition of independent Bernoulli random
variables
In particular, let Xi be binary, with P{ Xi = 0} = P{ Xi = 1} = 0.5, and let X1 , . . . , Xn be independent
T HOMAS S ARGENT AND J OHN S TACHURSKI
220
When n = 1, the distribution is flat one success or no successes have the same probability
When n = 2 we can either have 0, 1 or 2 successes
Notice the peak in probability mass at the mid-point k = 1
The reason is that there are more ways to get 1 success (fail then succeed or succeed then fail)
than to get zero or two successes
Moreover, the two trials are independent, so the outcomes fail then succeed and succeed then
fail are just as likely as the outcomes fail then fail and succeed then succeed
(If there was positive correlation, say, then succeed then fail would be less likely than succeed
then succeed)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails
For n = 4 and n = 8 we again get a peak at the middle value (halfway between the minimum
and the maximum possible value)
The intuition is the same there are simply more ways to get these middle outcomes
If we continue, the bell-shaped curve becomes ever more pronounced
We are witnessing the binomial approximation of the normal distribution
Simulation 1 Since the CLT seems almost magical, running simulations that verify its implications is one good way to build intuition
T HOMAS S ARGENT AND J OHN S TACHURSKI
221
Histograms draws of
222
plt.show()
The fit to the normal density is already tight, and can be further improved by increasing n
You can also experiment with other specifications of F
Note: You might need to delete or modify the lines beginning with rc to get this code to run on
your computer
223
In the figure, the closest density is that of Y1 , while the furthest is that of Y5
224
E [ X1 ]
1
E [ X2 ] 2
E[ X ] : =
= .. =:
..
.
.
E[ Xk ]
Var[X] =
E[( X1 1 )( X1 1 )] E[( X1 1 )( Xk k )]
E[( X2 2 )( X1 1 )] E[( X2 2 )( Xk k )]
..
..
..
.
.
.
E[( Xk k )( X1 1 )] E[( Xk k )( Xk k )]
(2.19)
n ) N (0, )
n(X
as
(2.20)
Exercises
Exercise 1 One very useful consequence of the central limit theorem is as follows
Assume the conditions of the CLT as stated above
If g : R R is differentiable at and g0 () 6= 0, then
d
n{ g( X n ) g()} N (0, g0 ()2 2 )
as
(2.21)
January 28, 2015
225
This theorem is used frequently in statistics to obtain the asymptotic distribution of estimators
many of which can be expressed as functions of sample means
(These kinds of results are often said to use the delta method)
The proof is based on a Taylor expansion of g around the point
Taking the result as given, let the distribution F of each Xi be uniform on [0, /2] and let g( x ) =
sin( x )
Derive the asymptotic distribution of n{ g( X n ) g()} and illustrate convergence in the same
spirit as the program illustrate_clt.py discussed above
What happens when you replace [0, /2] with [0, ]?
What is the source of the problem?
Exercise 2 Heres a result thats often used in developing statistical tests, and is connected to the
multivariate central limit theorem
If you study econometric theory, you will see this result used again and again
Assume the setting of the multivariate CLT discussed above, so that
1. X1 , . . . , Xn is a sequence of IID random vectors, each taking values in Rk
2. := E[Xi ], and is the variance-covariance matrix of Xi
3. The convergence
d
n(X n ) N (0, )
(2.22)
is valid
In a statistical setting, one often wants the right hand side to be standard normal, so that confidence intervals are easily computed
This normalization can be achieved on the basis of three observations
First, if X is a random vector in Rk and A is constant and k k, then
Var[AX] = A Var[X]A0
d
AZn AZ
Third, if S is a k k symmetric positive definite matrix, then there exists a symmetric positive
definite matrix Q, called the inverse square root of S, such that
QSQ0 = I
Here I is the k k identity matrix
Putting these things together, your first exercise is to show that if Q is the inverse square root of
, then
d
Zn := nQ(X n ) Z N (0, I)
T HOMAS S ARGENT AND J OHN S TACHURSKI
226
Applying the continuous mapping theorem one more time tells us that
d
k Z n k2 k Z k2
Given the distribution of Z, we conclude that
d
nkQ(X n )k2 2 (k )
(2.23)
Wi
Ui + Wi
where
each Wi is an IID draw from the uniform distribution on [1, 1]
each Ui is an IID draw from the uniform distribution on [2, 2]
Ui and Wi are independent of each other
Hints:
1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it
2. You should be able to work out from the proceding information
Solutions
Solution notebook
227
Objects in play
An n 1 vector xt denoting the state at time t = 0, 1, 2, . . .
An m 1 vector of iid shocks wt+1 N (0, I )
A k 1 vector yt of observations at time t = 0, 1, 2, . . .
An n n matrix A called the transition matrix
An n m matrix C called the volatility matrix
A k n matrix G sometimes called the output matrix
Here is the linear state-space system
xt+1 = Axt + Cwt+1
(2.24)
yt = Gxt
x0 N ( 0 , 0 )
228
E [ w t +1 | x t , x t 1 , . . . ] = 0
This is a weaker condition than that {wt } is iid with wt+1 N (0, I )
Examples By appropriate choice of the primitives, a variety of dynamics can be represented in
terms of the linear state space model
The following examples help to highlight this point
They also illustrate the wise dictum finding the state is an art
Second-order difference equation Let {yt } be a deterministic sequence that satifies
yt+1 = 0 + 1 yt + 2 yt1
s.t.
y0 , y1 given
1
1 0 0
0
xt = yt
A = 0 1 2
C = 0
y t 1
0 1 0
0
(2.25)
G= 0 1 0
You can confirm that under these definitions, (2.24) and (2.25) agree
The next figure shows dynamics of this process when 0 = 1.1, 1 = 0.8, 2 = 0.8, y0 = y1 = 1
Later youll be asked to recreate this figure
229
(2.26)
1 2 3 4
1 0 0 0
0
1
0
0
0
A=
C
=
G
=
0 1 0 0
0
0 0 1 0
0
The matrix A has the form of the companion matrix to the vector 1 2 3 4 .
The next figure shows dynamics of this process when
1 = 0.5, 2 = 0.2, 3 = 0, 4 = 0.5, = 0.2, y0 = y1 = y2 = y3 = 1
yt
1 2 3 4
y t 1
I 0 0 0
xt =
y t 2 A = 0
I 0 0
y t 3
0 0
I 0
T HOMAS S ARGENT AND J OHN S TACHURSKI
C=
0
0
G= I 0 0 0
230
0 0 0 1
1 0 0 0
A=
0 1 0 0
0 0 1 0
The eigenvalues are (1, 1, i, i ), and so have period four 4
The resulting sequence oscillates deterministically with period four, and can be used to model
deterministic seasonals in quarterly time series
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations.
Time Trends The model yt = at + b is known as a linear time trend
We can represent this model in the linear state space form by taking
1 1
0
A=
C=
G= a b
0 1
0
(2.27)
0
and starting at initial condition x0 = 0 1
4
For example, note that i = cos(/2) + i sin(/2), so the period associated with i is
2
/2
= 4.
231
In fact its possible to use the state-space system to represent polynomial trends of any order
For instance, let
0
x0 = 0
1
It follows that
1 1 0
A = 0 1 1
0 0 1
C = 0
0
1 t t(t 1)/2
t
A t = 0 1
0 0
1
Then xt0 = t(t 1)/2 t 1 , so that xt contains linear and quadratic time trends
As a variation on the linear time trend model, consider yt = t + b + tj=0 w j with w0 = 0
To modify (2.27) accordingly, we set
1 1
A=
0 1
1
C=
0
G= 1 b
(2.28)
(2.29)
A j Cwt j + At x0
j =0
x1t =
wt j +
1 t x0
j =0
232
The first term on the right is a cumulated sum of martingale differences, and is therefore a martingale
The second term is a translated linear function of time
For this reason, x1t is called a martingale with drift
Distributions and Moments
Unconditional Moments Using (2.24), its easy to obtain expressions for the (unconditional)
mean of xt and yt
Well explain what unconditional and conditional mean soon
Letting t := E [ xt ] and using linearity of expectations, we find that
t+1 = At
(2.30)
with
0 given
(2.31)
233
and
BSB0 )
v = a + Bu = v N ( a + Bu,
(2.32)
In particular, given our Gaussian assumptions on the primitives and the linearity of (2.24) we can
see immediately that both xt and yt are Gaussian for all t 0 5
Since xt is Gaussian, to find the distribution, all we need to do is find its mean and variancecovariance matrix
But in fact weve already done this, in (2.30) and (2.31)
Letting t and t be as defined by these equations, we have
xt N (t , t )
and
yt N ( Gt , Gt G 0 )
(2.33)
In the right-hand figure, these values are converted into a rotated histogram that shows relative
frequencies from our sample of 20 y T s
5 The correct way to argue this is by induction. Suppose that x is Gaussian. Then (2.24) and (2.32) imply that x
t
t +1
is Gaussian. Since x0 is assumed to be Gaussian, it follows that every xt is Gaussian. Evidently this implies that each yt
is Gaussian.
234
(The parameters and source code for the figures can be found in file examples/paths_and_hist.py
from the main repository)
Here is another figure, this time with 100 observations
Lets now try with 500,000 observations, showing only the histogram (without rotation)
235
1 I i
yT
I i
=1
1 I i
xT T
I i
=1
( I )
( I )
236
whats the probability that the process {yt } exceeds some value a before falling below b?
etc., etc.
Such questions concern the joint distributions of these sequences
To compute the joint distribution of x0 , x1 , . . . , x T , recall that in general joint and conditional densities are linked by the rule
p( x, y) = p(y | x ) p( x )
p ( x 0 , x 1 , . . . , x T ) = p ( x 0 ) p ( x t +1 | x t )
t =0
(2.35)
Notice that t+ j,t in general depends on both j, the gap between the two dates, and t, the earlier
date
Stationarity and Ergodicity
Two properties that greatly aid analysis of linear state space models when they hold are stationarity and ergodicity
Lets start with the intuition
Visualizing Stability Lets look at some more time series from the same model that we analyzed
above
This picture shows cross-sectional distributions for y at times T, T 0 , T 00
Note how the time series settle down in the sense that the distributions at T 0 and T 00 are relatively similar to each other but unlike the distribution at T
In essence, the distributions of yt are converging to a fixed long-run distribution as t
When such a distribution exists it is called a stationary distribution
T HOMAS S ARGENT AND J OHN S TACHURSKI
237
and
xt+1
Since
1. in the present case all distributions are Gaussian
2. a Gaussian distribution is pinned down by its mean and variance-covariance matrix
we can restate the definition as follows: is stationary for xt if
= N ( , )
where and are fixed points of (2.30) and (2.31) respectively
Covariance Stationary Processes Lets see what happens to the preceding figure if we start x0 at
the stationary distribution
Now the differences in the observed distributions at T, T 0 and T 00 come entirely from random
fluctuations due to the finite sample size
By
our choosing x0 N ( , )
the definitions of and as fixed points of (2.30) and (2.31) respectively
weve ensured that
t =
and
t =
for all t
Moreover, in view of (2.35), the autocovariance function takes the form t+ j,t = A j , which
depends on j but not on t
This motivates the following definition
A process { xt } is said to be covariance stationary if
T HOMAS S ARGENT AND J OHN S TACHURSKI
238
and
as
239
Processes with a constant state component To investigate such a process, suppose that A and C
take the form
A1 a
C1
A=
C=
0 1
0
where
A1 is an (n 1) (n 1) matrix
a is an (n 1) 1 column vector
0
0
1 where x1t is (n 1) 1
Let xt = x1t
It follows that
x1,t+1 = A1 x1t + a + C1 wt+1
Let 1t = E [ x1t ] and take expectations on both sides of this expression to get
1,t+1 = A1 1,t + a
(2.36)
Assume now that the moduli of the eigenvalues of A1 are all strictly less than one
Then (2.36) has a unique stationary solution, namely,
1 = ( I A1 )1 a
0
0
1
The stationary value of t itself is then := 1
The stationary values of t and t+ j,t satisfy
= A A0 + CC 0
(2.37)
t+ j,t = A
j
Notice that t+ j,t depends on the time gap j but not on calendar time t
In conclusion, if
x0 N ( , ) and
the moduli of the eigenvalues of A1 are all strictly less than unity
then the { xt } process is covariance stationary, with constant state component
Note: If the eigenvalues of A1 are less than unity in modulus, then (a) starting from any initial
value, the mean and variance-covariance matrix both converge to their stationary values; and (b)
iterations on (2.31) converge to the fixed point of the discrete Lyapunov equation in the first line of
(2.37)
Ergodicity Lets suppose that were working with a covariance stationary process
In this case we know that the ensemble mean will converge to as the sample size I approaches
infinity
T HOMAS S ARGENT AND J OHN S TACHURSKI
240
Averages over time Ensemble averages across simulations are interesting theoretically, but in
real life we usually observe only a single realization { xt , yt }tT=0
So now lets take a single realization and form the time series averages
1
T
x :=
xt
t =1
and
y :=
1
T
yt
t =1
Do these time series averages converge to something interpretable in terms of our basic state-space
representation?
To get this desideratum, we require something called ergodicity
Ergodicity is the property that time series and ensemble averages coincide
More formally, ergodicity implies that time series sample averages converge to their expectation
under the stationary distribution
In particular,
1
T
tT=0 xt
1
T
tT=0 ( xt x T )( xt x T )0
1
T
tT=0 ( xt+ j x T )( xt x T )0 A j
In our linear Gaussian setting, any covariance stationary process is also ergodic
Prediction
The theory of prediction for linear state space systems is elegant and simple
Forecasting Formulas Conditional Means The natural way to predict variables is to use conditional distributions
For example, the optimal forecast of xt+1 given information known at time t is
241
E t [ x t + j ] : = E [ x t + j | x t , x t 1 , . . . , x 0 ]
j-step ahead forecasts of y: E t [yt+ j ] := E [yt+ j | xt , xt1 , . . . , x0 ]
E t [ xt+ j ] = A j xt
j-step ahead forecast of y:
E t [yt+j ] = GA j xt
Covariance of Prediction Errors It is useful to obtain the covariance matrix of the vector of jstep-ahead prediction errors
xt+ j E t [ xt+ j ] =
j 1
As Cwts+ j
(2.38)
s =0
Evidently,
Vj := E t [( xt+ j E t [ xt+ j ])( xt+ j E t [ xt+ j ])0 ] =
j 1
Ak CC0 Ak
(2.39)
k =0
j2
(2.40)
Vj is the conditional covariance matrix of the errors in forecasting xt+ j , conditioned on time t information xt
Under particular conditions, Vj converges to
V = CC 0 + AV A0
(2.41)
|
x
t+ j t
j =0
h
i
jy
Forecast of a geometric sum of future ys, or E
|
x
t
t+ j
j =0
242
These objects are important components of some famous and interesting dynamic models
For example,
i
h
jy
if {yt } is a stream of dividends, then E
|
x
t+ j t is a model of a stock price
j =0
h
i
jy
if {yt } is the money supply, then E
|
x
is a model of the price level
t
t+ j
j =0
Formulas Fortunately, it is easy to use a little matrix algebra to compute these objects
Suppose that every eigenvalue of A has modulus strictly less than
E t j xt+ j
= [ I + A + 2 A2 + ] xt = [ I A]1 xt
j =0
E t j yt+ j
= G [ I + A + 2 A2 + ] xt = G [ I A]1 xt
j =0
Code
Our preceding simulations and calculations are based on code in the file lss.py from the QuantEcon package
The code implements a class for handling linear state space models (simulations, calculating moments, etc.)
We repeat it here for convenience
"""
Authors: Thomas J. Sargent, John Stachurski
Filename: lss.py
Computes quantities related to the Gaussian linear state space model
x_{t+1} = A x_t + C w_{t+1}
y_t = G x_t
The shocks {w_t} are iid and N(0, I)
"""
import numpy as np
243
244
if mu_0 is None:
self.mu_0 = np.zeros((self.n, 1))
else:
self.mu_0 = np.asarray(mu_0)
if Sigma_0 is None:
self.Sigma_0 = np.zeros((self.n, self.n))
else:
self.Sigma_0 = Sigma_0
def convert(self, x):
"""
Convert array_like objects (lists of lists, floats, etc.) into
well formed 2D NumPy arrays
"""
return np.atleast_2d(np.asarray(x, dtype='float32'))
def simulate(self, ts_length=100):
"""
Simulate a time series of length ts_length, first drawing
x_0 ~ N(mu_0, Sigma_0)
Parameters
---------ts_length : scalar(int), optional(default=100)
The length of the simulation
Returns
------x : array_like(float)
An n x ts_length array, where the t-th column is x_t
y : array_like(float)
A k x ts_length array, where the t-th column is y_t
"""
x = np.empty((self.n, ts_length))
x[:, 0] = multivariate_normal(self.mu_0.flatten(), self.Sigma_0)
w = np.random.randn(self.m, ts_length-1)
for t in range(ts_length-1):
x[:, t+1] = self.A.dot(x[:, t]) + self.C.dot(w[:, t])
y = self.G.dot(x)
return x, y
def replicate(self, T=10, num_reps=100):
"""
Simulate num_reps observations of x_T and y_T given
x_0 ~ N(mu_0, Sigma_0).
Parameters
----------
245
T : scalar(int), optional(default=10)
The period that we want to replicate values for
num_reps : scalar(int), optional(default=100)
The number of replications that we want
Returns
------x : array_like(float)
An n x num_reps array, where the j-th column is the j_th
observation of x_T
y : array_like(float)
A k x num_reps array, where the j-th column is the j_th
observation of y_T
"""
x = np.empty((self.n, num_reps))
for j in range(num_reps):
x_T, _ = self.simulate(ts_length=T+1)
x[:, j] = x_T[:, -1]
y = self.G.dot(x)
return x, y
def moment_sequence(self):
"""
Create a generator to calculate the population mean and
variance-convariance matrix for both x_t and y_t, starting at
the initial condition (self.mu_0, self.Sigma_0). Each iteration
produces a 4-tuple of items (mu_x, mu_y, Sigma_x, Sigma_y) for
the next period.
Yields
-----mu_x : array_like(float)
An n x 1 array representing the population mean of x_t
mu_y : array_like(float)
A k x 1 array representing the population mean of y_t
Sigma_x : array_like(float)
An n x n array representing the variance-covariance matrix
of x_t
Sigma_y : array_like(float)
A k x k array representing the variance-covariance matrix
of y_t
"""
# == Simplify names == #
A, C, G = self.A, self.C, self.G
# == Initial moments == #
mu_x, Sigma_x = self.mu_0, self.Sigma_0
while 1:
mu_y, Sigma_y = G.dot(mu_x), G.dot(Sigma_x).dot(G.T)
yield mu_x, mu_y, Sigma_x, Sigma_y
246
"""
# == Initialize iteration == #
m = self.moment_sequence()
mu_x, mu_y, Sigma_x, Sigma_y = next(m)
i = 0
error = tol + 1
# == Loop until convergence or failuer == #
while error > tol:
if i > max_iter:
fail_message = 'Convergence failed after {} iterations'
raise ValueError(fail_message.format(max_iter))
else:
i += 1
mu_x1, mu_y1, Sigma_x1, Sigma_y1 = next(m)
error_mu = np.max(np.abs(mu_x1 - mu_x))
error_Sigma = np.max(np.abs(Sigma_x1 - Sigma_x))
error = max(error_mu, error_Sigma)
mu_x, Sigma_x = mu_x1, Sigma_x1
# == Prepare return values == #
mu_x_star, Sigma_x_star = mu_x, Sigma_x
247
Exercise 1 Replicate this figure using the LSS class from lss.py
Exercise 2 Replicate this figure modulo randomness using the same class
248
Exercise 3 Replicate this figure modulo randomness using the same class
The state space model and parameters are the same as for the preceding exercise
Exercise 4 Replicate this figure modulo randomness using the same class
The state space model and parameters are the same as for the preceding exercise, except that the
initial condition is the stationary distribution
Hint: You can use the stationary_distributions method to get the initial conditions
The number of sample paths is 80, and the time horizon in the figure is 100
Producing the vertical bars and dots is optional, but if you wish to try, the bars are at dates 10, 50
and 75
Solutions
Solution notebook
Overview
This lecture provides a simple and intuitive introduction to the Kalman filter, for those who either
have heard of the Kalman filter but dont know how it works, or
know the Kalman filter equations, but dont know where they come from
For additional (more advanced) reading on the Kalman filter, see
[LS12], section 2.7.
[AM05]
249
The last reference gives a particularly clear and comprehensive treatment of the Kalman filter
Required knowledge: Familiarity with matrix manipulations, multivariate normal distributions,
covariance matrices, etc.
The Basic Idea
The Kalman filter has many applications in economics, but for now lets pretend that we are rocket
scientists
A missile has been launched from country Y and our mission is to track it
Let x R2 denote the current location of the missilea pair indicating latitude-longitute coordinates on a map
At the present moment in time, the precise location x is unknown, but we do have some beliefs
about x
One way to summarize our knowledge is a point prediction x
But what if the President wants to know the probability that the missile is currently over the
Sea of Japan?
Better to summarize our initial beliefs with a bivariate probability density p
R
E p( x )dx indicates the probability that we attach to the missile being in region E
The density p is called our prior for the random variable x
To keep things tractable, we will always assume that our prior is Gaussian. In particular, we take
)
p = N ( x,
(2.42)
where x is the mean of the distribution and is a 2 2 covariance matrix. In our simulations, we
will suppose that
0.2
0.4 0.3
x =
,
=
(2.43)
0.2
0.3 0.45
This density p( x ) is shown below as a contour map, with the center of the red ellipse being equal
to x
The Filtering Step We are now presented with some good news and some bad news
The good news is that the missile has been located by our sensors, which report that the current
location is y = (2.3, 1.9)
The next figure shows the original prior p( x ) and the new reported location y
The bad news is that our sensors are imprecise.
In particular, we should interpret the output of our sensor not as y = x, but rather as
y = Gx + v,
where
v N (0, R)
(2.44)
250
Figure 2.1: Prior density (Click this or any other figure to enlarge.)
251
Here G and R are 2 2 matrices with R positive definite. Both are assumed known, and the noise
term v is assumed to be independent of x
) and this new information y to improve
How then should we combine our prior p( x ) = N ( x,
our understanding of the location of the missile?
As you may have guessed, the answer is to use Bayes theorem, which tells us we should update
our prior p( x ) to p( x | y) via
p(y | x ) p( x )
p( x | y) =
p(y)
R
where p(y) = p(y | x ) p( x )dx
In solving for p( x | y), we observe that
)
p( x ) = N ( x,
In view of (2.44), the conditional density p(y | x ) is N ( Gx, R)
p(y) does not depend on x, and enters into the calculations only as a normalizing constant
Because we are in a linear and Gaussian framework, the updated density can be computed by
calculating population linear regressions.
In particular, the solution is known 6 to be
p( x | y) = N ( x F , F )
where
x F := x + G 0 ( GG 0 + R)1 (y G x )
and
F := G 0 ( GG 0 + R)1 G
(2.45)
Here G 0 ( GG 0 + R)1 is the matrix of population regression coefficients of the hidden object
x x on the surprise y G x
This new density p( x | y) = N ( x F , F ) is shown in the next figure via contour lines and the color
map
The original density is left in as contour lines for comparison
Our new density twists the prior p( x ) in a direction determined by the new information y G x
In generating the figure, we set G to the identity matrix and R = 0.5 for defined in (2.43)
(The code for generating this and the proceding figures can be found in the file examples/gaussian_contours.py from the main repository)
The Forecast Step What have we achieved so far?
We have obtained probabilities for the current location of the state (missile) given prior and current
information
This is called filtering rather than forecasting, because we are filtering out noise rather than
looking into the future
6
See, for example, page 93 of [Bis06]. To get from his expressions to the ones used above, you will also need to apply
the Woodbury matrix identity.
252
where
wt N (0, Q)
(2.46)
Our aim is to combine this law of motion and our current distribution p( x | y) = N ( x F , F ) to
come up with a new predictive distribution for the location one unit of time hence
In view of (2.46), all we have to do is introduce a random vector x F N ( x F , F ) and work out the
distribution of Ax F + w where w is independent of x F and has distribution N (0, Q)
Since linear combinations of Gaussians are Gaussian, Ax F + w is Gaussian
Elementary calculations and the expressions in (2.45) tell us that
E[ Ax F + w] = AEx F + Ew = A x F = A x + AG 0 ( GG 0 + R)1 (y G x )
and
Var[ Ax F + w] = A Var[ x F ] A0 + Q = A F A0 + Q = AA0 AG 0 ( GG 0 + R)1 GA0 + Q
The matrix AG 0 ( GG 0 + R)1 is often written as K and called the Kalman gain
the subscript has been added to remind us that K depends on , but not y or x
Using this notation, we can summarize our results as follows: Our updated prediction is the
density N ( x new , new ) where
x new := A x + K (y G x )
0
(2.47)
new := AA K GA + Q
T HOMAS S ARGENT AND J OHN S TACHURSKI
253
254
(2.48)
t+1 = At A Kt Gt A + Q
These are the standard dynamic equations for the Kalman filter. See, for example, [LS12], page 58.
Convergence
(2.49)
(2.50)
The class Kalman from the QuantEcon package implements the Kalman filter
The class bundles together
T HOMAS S ARGENT AND J OHN S TACHURSKI
255
Instance data:
The parameters A, G, Q, R of a given model
the moments ( x t , t ) of the current prior
Methods:
a method prior_to_filtered to update ( x t , t ) to ( x tF , tF )
a method filtered_to_forecast to update the filtering distribution to the predictive
distribution which becomes the new prior ( x t+1 , t+1 )
an update method, which combines the last two methods
a stationary_values method, which computes the solution to (2.50) and the corresponding (stationary) Kalman gain
You can view the program on GitHub but we repeat it here for convenience
"""
Filename: kalman.py
Authors: Thomas Sargent, John Stachurski
Implements the Kalman filter for a linear Gaussian state space model.
"""
import numpy as np
from numpy import dot
from scipy.linalg import inv
from .matrix_eqn import solve_discrete_riccati
class Kalman:
r"""
Implements the Kalman filter for the Gaussian state space model
.. math::
x_{t+1} &= A x_t + w_{t+1}\\
y_t &= G x_t + v_t.
Here :math:`x_t` is the hidden state and :math:`y_t` is the
measurement. The shocks :math:`w_t` and :math:`v_t` are iid zero
mean Gaussians with covariance matrices Q and R respectively.
Parameters
----------A : array_like or scalar(float)
The n x n matrix A
Q : array_like or scalar(float)
Q is n x n, symmetric and nonnegative definite
G : array_like or scalar(float)
G is k x n
256
257
258
The full
Parameters
---------y : np.ndarray
A k x 1 ndarray y representing the current measurement
"""
self.prior_to_filtered(y)
self.filtered_to_forecast()
def stationary_values(self):
"""
Computes the limit of :math:`Sigma_t` as :math:`t \to \infty` by
solving the associated Riccati equation. Computation is via the
doubling algorithm (see the documentation in
`matrix_eqn.solve_discrete_riccati`).
Returns
------Sigma_infinity : array_like or scalar(float)
The infinite limit of Sigma_t
K_infinity : array_like or scalar(float)
The stationary Kalman gain.
"""
# === simplify notation === #
A, Q, G, R = self.A, self.Q, self.G, self.R
# === solve Riccati equation, obtain Kalman gain === #
Sigma_infinity = solve_discrete_riccati(A.T, G.T, Q, R)
temp1 = dot(dot(A, Sigma_infinity), G.T)
temp2 = inv(dot(G, dot(Sigma_infinity, G.T)) + R)
K_infinity = dot(temp1, temp2)
return Sigma_infinity, K_infinity
Exercises
Exercise 1 Consider the following simple application of the Kalman filter, loosely based on
[LS12], section 2.9.2
Suppose that
all variables are scalars
259
the hidden state { xt } is in fact constant, equal to some R unknown to the modeler
State dynamics are therefore given by (2.46) with A = 1, Q = 0 and x0 =
The measurement equation is yt = + vt where vt is N (0, 1) and iid
The task of this exercise to simulate the model and, using the code from kalman.py, plot the first
five predictive densities pt ( x ) = N ( x t , t )
As shown in [LS12], sections 2.9.12.9.2, these distributions asymptotically put all mass on the
unknown value
In the simulation, take = 10, x0 = 8 and 0 = 1
Your figure should modulo randomness look something like this
Exercise 2 The preceding figure gives some support to the idea that probability mass converges
to
To get a better idea, choose a small e > 0 and calculate
zt := 1
Z +e
e
pt ( x )dx
for t = 0, 1, 2, . . . , T
Plot zt against T, setting e = 0.1 and T = 600
Your figure should show error erratically declining something like this
260
Exercise 3 As discussed above, if the shock sequence {wt } is not degenerate, then it is not in
general possible to predict xt without error at time t 1 (and this would be the case even if we
could observe xt1 )
Lets now compare the prediction x t made by the Kalman filter against a competitor who is allowed to observe xt1
This competitor will use the conditional expectation E[ xt | xt1 ], which in this case is Axt1
The conditional expectation is known to be the optimal prediction method in terms of minimizing
mean squared error
(More precisely, the minimizer of E k xt g( xt1 )k2 with respect to g is g ( xt1 ) := E[ xt | xt1 ])
Thus we are comparing the Kalman filter against a competitor who has more information (in the
sense of being able to observe the latent state) and behaves optimally in terms of minimizing
squared error
Our horse race will be assessed in terms of squared error
In particular, your task is to generate a graph plotting observations of both k xt Axt1 k2 and
k xt x t k2 against t for t = 1, . . . , 50
For the parameters, set G = I, R = 0.5I and Q = 0.3I, where I is the 2 2 identity
Set
0.5 0.4
0.6 0.3
0.9 0.3
0.3 0.9
A=
To initialize the prior density, set
0 =
and x0 = (8, 8)
T HOMAS S ARGENT AND J OHN S TACHURSKI
261
Observe how, after an initial learning period, the Kalman filter performs quite well, even relative
to the competitor who predicts optimally with knowledge of the latent state
Exercise 4 Try varying the coefficient 0.3 in Q = 0.3I up and down
Observe how the diagonal values in the stationary solution (see (2.50)) increase and decrease in
line with this coefficient
The interpretation is that more randomness in the law of motion for xt causes more (permanent)
uncertainty in prediction
Solutions
Solution notebook
262
Contents
Infinite Horizon Dynamic Programming
Overview
An Optimal Growth Model
Dynamic Programming
Computation
Writing Reusable Code
Exercises
Solutions
Overview
In a previous lecture we gained some intuition about finite stage dynamic programming by studying the shortest path problem
The aim of this lecture is to introduce readers to methods for solving simple infinite-horizon dynamic programming problems using Python
We will also introduce and motivate some of the modeling choices used throughout the lectures
to treat this class of problems
The particular application we will focus on is solving for consumption in an optimal growth model
Although the model is quite specific, the key ideas extend to many other problems in dynamic
optimization
The model is also very simplistic we favor ease of exposition over realistic assumptions
throughout the current lecture
Other References For supplementary reading see
[LS12], section 3.1
EDTC, section 6.2 and chapter 10
[Sun96], chapter 12
[SLP89], chapters 25
[HLL96], all
An Optimal Growth Model
Consider an agent who owns at time t capital stock k t R+ := [0, ) and produces output
y t : = f ( k t ) R+
This output can either be consumed or saved as capital for next period
263
For simplicity we assume that depreciation is total, so that next period capital is just output minus
consumption:
k t +1 = y t c t
(2.51)
Taking k0 as given, we suppose that the agent wishes to maximize
t u(ct )
(2.52)
t =0
for all t
In other words, the current control is a fixed (i.e., time homogeneous) function of the current state
The Policy Function Approach As it turns out, we are better off seeking the function directly,
rather than the optimal consumption sequence
The main reason is that the functional approach seeking the optimal policy translates directly
over to the stochastic case, whereas the sequential approach does not
For this model, we will say that function mapping R+ into R+ is a feasible consumption policy if it
satisfies
(k ) f (k ) for all k R+
(2.53)
The set of all such policies will be denoted by
Using this notation, the agents decision problem can be rewritten as
(
)
max
t u((kt ))
(2.54)
t =0
k0 given
(2.55)
In the next section we discuss how to solve this problem for the maximizing
264
v (k0 ) :=
t u((kt ))
(2.56)
t =0
(2.57)
The value function gives the maximal value that can be obtained from state k0 , after considering
all feasible policies
A policy is called optimal if it attains the supremum in (2.57) for all k0 R+
The Bellman equation for this problem takes the form
v (k ) = max {u(c) + v ( f (k ) c)}
0 c f ( k )
for all
k R+
(2.58)
It states that maximal value from a given state can be obtained by trading off current reward from
a given action against the (discounted) future value of the state resulting from that action
(If the intuition behind the Bellman equation is not clear to you, try working through this lecture)
As a matter of notation, given a continuous function w on R+ , we say that policy is w-greedy
if (k ) is a solution to
max {u(c) + w( f (k) c)}
(2.59)
0 c f ( k )
for every k R+
Theoretical Results As with most optimization problems, conditions for existence of a solution
typically require some form of continuity and compactness
In addition, some restrictions are needed to ensure that the sum of discounted utility is always
finite
For example, if we are prepared to assume that f and u are continuous and u is bounded, then
1. The value function v is finite, bounded, continuous and satisfies the Bellman equation
2. At least one optimal policy exists
3. A policy is optimal if and only if it is v -greedy
265
(2.60)
0 c f ( k )
266
In this lecture we will use both bounded and unbounded utility functions without dwelling on
the theory
Computation
Lets now look at computing the value function and the optimal policy
Fitted Value Iteration The first step is to compute the value function by iterating with the Bellman operator
In theory, the algorithm is as follows
1. Begin with a function w an initial condition
2. Solving (2.60), obtain the function Tw
3. Unless some stopping condition is satisfied, set w = Tw and go to step 2
However, there is a problem we must confront before we implement this procedure: The iterates
can neither be calculated exactly nor stored on a computer
To see the issue, consider (2.60)
Even if w is a known function, unless Tw can be shown to have some special structure, the only
way to store this function is to record the value Tw(k ) for every k R+
Clearly this is impossible
What we will do instead is use fitted value function iteration
The procedure is to record the value of the function Tw at only finitely many grid points
{k1 , . . . , k I } R+ , and reconstruct it from this information when required
More precisely, the algorithm will be
1. Begin with an array of values {w1 , . . . , w I }, typically representing the values of some initial
function w on the grid points {k1 , . . . , k I }
2. build a function w on the state space R+ by interpolating the points {w1 , . . . , w I }
3. By repeatedly solving (2.60), obtain and record the value T w (k i ) on each grid point k i
4. Unless some stopping condition is satisfied, set {w1 , . . . , w I } = { T w (k1 ), . . . , T w (k I )} and
go to step 2
How should we go about step 2?
This is a problem of function approximation, and there are many ways to approach it
Whats important here is that the function approximation scheme must not only produce a good
approximation to Tw, but also combine well with the broader iteration algorithm described above
One good choice from both respects is continuous piecewise linear interpolation (see this paper
for further discussion)
267
The next figure illustrates piecewise linear interpolation of an arbitrary function on grid points
0, 0.2, 0.4, . . . , 1
Another advantage of piecewise linear interpolation is that it preserves useful shape properties
such as monotonicity and concavity / convexity
A First Pass Implementation Lets now look at an implementation of fitted value function iteration using Python
In the example below,
f (k ) = k with = 0.65
u(c) = ln c and = 0.95
As is well-known (see [LS12], section 3.1.2), for this particular problem an exact analytical solution
is available, with
v (k ) = c1 + c2 ln k
(2.61)
for
c1 : =
ln(1 )
ln()
+
1
(1 )(1 )
and
c2 : =
At this stage, our only aim is to see if we can replicate this solution numerically, using fitted value
function iteration
Heres a first-pass solution, the details of which are explained below
The code can be found in file examples/optgrowth_v0.py from the main repository
T HOMAS S ARGENT AND J OHN S TACHURSKI
268
269
270
If we increase n and run again we see further improvement the next figure shows n = 75
Incidentally, it is true that knowledge of the functional form of v for this model has influenced
our choice of the initial condition
w = 5 * log(grid) - 25
In more realistic problems such information is not available, and convergence will probably take
longer
Comments on the Code The function bellman_operator implements steps 23 of the fitted
value function algorithm discussed above
Linear interpolation is performed by SciPys interp function
Like the rest of SciPys numerical solvers, fminbound minimizes its objective, so we use the identity
maxx f ( x ) = minx f ( x ) to solve (2.60)
The line if __name__ == __main__: is very common, and operates as follows
If the file is run as the result of an import statement in another file, the clause evaluates to
False, and the code block is not executed
If the file is run directly as a script, the clause evaluates to True, and the code block is executed
To see how this trick works, suppose we have a file in our current working directory called
test_file.py that contains the single line
271
The title of this section might sound uninteresting and a departure from our topic, but its equally
important if not more so
Its understandable that many economists never consider the basic principles of software development, preoccupied as they are with the applied aspects of trying to implement their projects
However, in programming as in many things, success tends to find those who focus on what is
important, not just what is urgent
The Danger of Copy and Paste For computing the value function of the particular growth model
studied above, the code we have already written (in file optgrowth_v0.py, shown here) is perfectly
fine
However, suppose that we now want to solve a different growth model, with different technology
and preferences
Probably we want to keep our existing code, so lets follow our first instinct and
T HOMAS S ARGENT AND J OHN S TACHURSKI
272
273
274
Instead, we would like bellman_operator to act in conjunction with a more general description
of a model (technology, preferences, etc.)
To do so its convenient to wrap the model description up in a class and add the Bellman operator
as a method
(Review this lecture if you have forgotten the syntax for class definitions)
This idea is implemented in the code below, in file optgrowth.py from the QuantEcon package
"""
Filename: optgrowth.py
Authors: John Stachurski and Thomas Sargent
Solving the optimal growth problem via value function iteration.
"""
from __future__ import division # Omit for Python 3.x
import numpy as np
from scipy.optimize import fminbound
from scipy import interp
class GrowthModel(object):
"""
This class defines the primitives representing the growth model.
Parameters
---------f : function, optional(default=k**.65)
The production function; the default is the Cobb-Douglas
production function with power of .65
beta : scalar(int), optional(default=.95)
The utility discounting parameter
u : function, optional(default=np.log)
The utility function. Default is log utility
grid_max : scalar(int), optional(default=2)
The maximum grid value
grid_size : scalar(int), optional(default=150)
The size of grid to use.
Attributes
---------f, beta, u : see Parameters
grid : array_like(float, ndim=1)
The grid over savings.
"""
def __init__(self, f=lambda k: k**0.65, beta=0.95, u=np.log,
grid_max=2, grid_size=150):
self.u, self.f, self.beta = u, f, beta
275
Of course we could omit the class structure and just pass date to bellman_operator and
compute_greedy as a list of separate arguments
For example
276
277
As currently written, the code continues iteration until one of two stopping conditions holds
1. Successive iterates become sufficiently close together, in the sense that the maximum deviation between them falls below error_tol
2. The number of iterations exceeds max_iter
Examples of usage for all the code above can be found in the solutions to the exercises
Exercises
Solution notebook
278
Overview
Linear quadratic (LQ) control refers to a class of dynamic optimization problems that have found
applications in almost every scientific field
This lecture provides an introduction to LQ control and its economic applications
As we will see, LQ systems have a simple structure that makes them an excellent workhorse for a
wide variety of economic problems
Moreover, while the linear-quadratic structure is restrictive, it is in fact far more flexible than it
may appear initially
These themes appear repeatedly below
Mathematically, LQ control problems are closely related to the Kalman filter, although we wont
pursue the deeper connections in this lecture
T HOMAS S ARGENT AND J OHN S TACHURSKI
279
The linear part of LQ is a linear law of motion for the state, while the quadratic part refers to
preferences
Lets begin with the former, move on to the latter, and then put them together into an optimization
problem
The Law of Motion Let xt be a vector describing the state of some economic system
Suppose that xt follows a linear law of motion given by
xt+1 = Axt + But + Cwt+1 ,
t = 0, 1, 2, . . .
(2.62)
Here
ut is a control vector, incorporating choices available to a decision maker confronting the
current state xt
{wt } is an uncorrelated zero mean shock process satisfying Ewt wt0 = I, where the right-hand
side is the identity matrix
Regarding the dimensions
xt is n 1, A is n n
ut is k 1, B is n k
wt is j 1, C is n j
Example 1 Consider a household budget constraint given by
a t +1 + c t = ( 1 + r ) a t + y t
Here at is assets, r is a fixed interest rate, ct is current consumption, and yt is current non-financial
income
T HOMAS S ARGENT AND J OHN S TACHURSKI
280
If we suppose that {yt } is uncorrelated and N (0, 2 ), then, taking {wt } to be standard normal, we
can write the system as
at+1 = (1 + r ) at ct + wt+1
This is clearly a special case of (2.62), with assets being the state and consumption being the control
Example 2 One unrealistic feature of the previous model is that non-financial income has a zero
mean and is often negative
This can easily be overcome by adding a sufficiently large mean
Hence in this example we take yt = wt+1 + for some positive real number
Another alteration thats useful to introduce (well see why soon) is to change the control variable
from consumption to the deviation of consumption from some ideal quantity c
(Most parameterizations will be such that c is large relative to the amount of consumption that is
attainable in each period, and hence the household wants to increase consumption)
For this reason, we now take our control to be ut := ct c
In terms of these variables, the budget constraint at+1 = (1 + r ) at ct + yt becomes
at+1 = (1 + r ) at ut c + wt+1 +
(2.63)
How can we write this new system in the form of equation (2.62)?
If, as in the previous example, we take at as the state, then we run into a problem: the law of
motion contains some constant terms on the right-hand side
This means that we are dealing with an affine function, not a linear one (recall this discussion)
Fortunately, we can easily circumvent this problem by adding an extra state variable
In particular, if we write
a t +1
1 + r c +
at
1
=
+
ut +
w t +1
1
0
1
1
0
0
(2.64)
xt :=
, A :=
, B :=
, C :=
1
0
1
0
0
(2.65)
281
Example 1 A very simple example that satisfies these assumptions is to take R and Q to be
identity matrices, so that current loss is
xt0 Ixt + u0t Iut = k xt k2 + kut k2
Thus, for both the state and the control, loss is measured as squared distance from the origin
(In fact the general case (2.66) can also be understood in this way, but with R and Q identifying
other non-Euclidean notions of distance from the zero vector)
Intuitively, we can often think of the state xt as representing deviation from a target, such as
deviation of inflation from some target level
deviation of a firms capital stock from some desired quantity
The aim is to put the state close to the target, while using controls parsimoniously
Example 2 In the household problem studied above, setting R = 0 and Q = 1 yields preferences
xt0 Rxt + u0t Qut = u2t = (ct c)2
Under this specification, the households current loss is the squared deviation of consumption
from the ideal level c
Optimality Finite Horizon
Lets now be precise about the optimization problem we wish to consider, and look at how to solve
it
The Objective We will begin with the finite horizon case, with terminal time T N
In this case, the aim is to choose a sequence of controls {u0 , . . . , u T 1 } to minimize the objective
(
)
E
T 1
(2.67)
t =0
282
283
(2.68)
The function JT 1 will be called the T 1 value function, and JT 1 ( x ) can be thought of as representing total loss-to-go from state x at time T 1 when the decision maker behaves optimally
Now lets step back to T 2
For a decision maker at T 2, the value JT 1 ( x ) plays a role analogous to that played by the
terminal loss JT ( x ) = x 0 R f x for the decision maker at T 1
That is, JT 1 ( x ) summarizes the future loss associated with moving to state x
The decision maker chooses her control u to trade off current loss against future loss, where
the next period state is x T 1 = Ax T 2 + Bu + CwT 1 , and hence depends on the choice of
current control
the cost of landing in state x T 1 is JT 1 ( x T 1 )
Her problem is therefore
min{ x T0 2 Rx T 2 + u0 Qu + EJT 1 ( Ax T 2 + Bu + CwT 1 )}
u
Letting
and
JT ( x ) = x 0 R f x
The first equality is the Bellman equation from dynamic programming theory specialized to the
finite horizon LQ problem
Now that we have { J0 , . . . , JT }, we can obtain the optimal controls
As a first step, lets find out what the value functions look like
It turns out that every Jt has the form Jt ( x ) = x 0 Pt x + dt where Pt is a n n matrix and dt is a
constant
We can show this by induction, starting from PT := R f and d T = 0
Using this notation, (2.68) becomes
JT 1 ( x ) := min{ x 0 Rx + u0 Qu + E( Ax + Bu + CwT )0 PT ( Ax + Bu + CwT )}
u
(2.69)
To obtain the minimizer, we can take the derivative of the r.h.s. with respect to u and set it equal
to zero
Applying the relevant rules of matrix calculus, this gives
u = ( Q + B0 PT B)1 B0 PT Ax
T HOMAS S ARGENT AND J OHN S TACHURSKI
(2.70)
January 28, 2015
284
PT 1 := R 2 A0 PT B( Q + B0 PT B)1 B0 PT A + A0 PT A
(2.71)
d T 1 := trace(C 0 PT C )
(2.72)
and
with
with
PT = R f
dT = 0
(2.73)
(2.74)
where
(2.75)
with
(2.76)
T 1
(2.77)
t =0
285
Here q is a large positive constant, the role of which is to induce the consumer to target zero debt
at the end of her life
(Without such a constraint, the optimal choice is to choose ct = c in each period, letting assets
adjust accordingly)
after which the constraint can be written as in
As before we set yt = wt+1 + and ut := ct c,
(2.63)
We saw how this constraint could be manipulated into the LQ formulation xt+1 = Axt + But +
Cwt+1 by setting xt = ( at 1)0 and using the definitions in (2.65)
To match with this state and control, the objective function (2.77) can be written in the form of
(2.67) by choosing
0 0
q 0
Q := 1, R :=
, and R f :=
0 0
0 0
Now that the problem is expressed in LQ form, we can proceed to the solution by applying (2.73)
and (2.75)
After generating shocks w1 , . . . , wT , the dynamics for assets and consumption can be simulated
via (2.76)
We provide code for all these operations below
The following figure was computed using this code, with r = 0.05, = 1/(1 + r ), c = 2, =
1, = 0.25, T = 45 and q = 106
The shocks {wt } were taken to be iid and standard normal
The top panel shows the time path of consumption ct and income yt in the simulation
As anticipated by the discussion on consumption smoothing, the time path of consumption is
much smoother than that for income
(But note that consumption becomes more irregular towards the end of life, when the zero final
asset requirement impinges more on consumption choices)
The second panel in the figure shows that the time path of assets at is closely correlated with
cumulative unanticipated income, where the latter is defined as
t
zt :=
wt
j =0
A key message is that unanticipated windfall gains are saved rather than consumed, while unanticipated negative shocks are met by reducing assets
(Again, this relationship breaks down towards the end of life due to the zero final asset requirement)
These results are relatively robust to changes in parameters
For example, lets increase from 1/(1 + r ) 0.952 to 0.96 while keeping other parameters fixed
This consumer is slightly more patient than the last one, and hence puts relatively more weight
on later consumption values
T HOMAS S ARGENT AND J OHN S TACHURSKI
286
287
288
Lets now consider a number of standard extensions to the LQ problem treated above
Nonstationary Parameters In some settings it can be desirable to allow A, B, C, R and Q to depend on t
For the sake of simplicity, weve chosen not to treat this extension in our implementation given
below
However, the loss of generality is not as large as you might first imagine
In fact, we can tackle many nonstationary models from within our implementation by suitable
choice of state variables
One illustration is given below
For further examples and a more systematic treatment, see [HS13], section 2.4
Adding a Cross-Product Term In some LQ problems, preferences include a cross-product term
u0t Nxt , so that the objective function becomes
(
)
E
T 1
(2.78)
t =0
with
PT = R f
(2.79)
where
(2.80)
289
Infinite Horizon Finally, we consider the infinite horizon case, with cross-product term, unchanged dynamics and objective function given by
(
)
E
(2.81)
t =0
In the infinite horizon case, optimal policies can depend on time only if time itself is a component
of the state vector xt
In other words, there exists a fixed matrix F such that ut = Fxt for all t
This stationarity is intuitive after all, the decision maker faces the same infinite horizon at every
stage, with only the current state changing
Not surprisingly, P and d are also constant
The stationary matrix P is given by the fixed point of (2.73)
Equivalently, it is the solution P to the discrete time algebraic Riccati equation
P := R ( B0 PA + N )0 ( Q + B0 PB)1 ( B0 PA + N ) + A0 PA
(2.82)
Equation (2.82) is also called the LQ Bellman equation, and the map that sends a given P into the
right-hand side of (2.82) is called the LQ Bellman operator
The stationary optimal policy for this model is
u = Fx
where
F := ( Q + B0 PB)1 ( B0 PA + N )
(2.83)
(2.84)
We have put together some code for solving finite and infinite horizon linear quadratic control
problems
T HOMAS S ARGENT AND J OHN S TACHURSKI
290
The code can be found in the file lqcontrol.py from the QuantEcon package
You can view the program on GitHub but we repeat it here for convenience
"""
Filename: lqcontrol.py
Authors: Thomas J. Sargent, John Stachurski
Provides a class called LQ for solving linear quadratic control
problems.
"""
import numpy as np
from numpy import dot
from scipy.linalg import solve
from .matrix_eqn import solve_discrete_riccati
class LQ:
r"""
This class is for analyzing linear quadratic optimal control
problems of either the infinite horizon form
.. math::
\min E \sum_{t=0}^{\infty} \beta^t r(x_t, u_t)
with
.. math::
r(x_t, u_t) := x_t' R x_t + u_t' Q u_t + 2 u_t' N x_t
or the finite horizon form
.. math::
\min E \sum_{t=0}^{T-1} \beta^t r(x_t, u_t) + \beta^T x_T' R_f x_T
Both are minimized subject to the law of motion
.. math::
x_{t+1} = A x_t + B u_t + C w_{t+1}
Here x is n x 1, u is k x 1, w is j x 1 and the matrices are
conformable for these dimensions. The sequence {w_t} is assumed to
be white noise, with zero mean and :math:`E w_t w_t' = I`, the j x j
identity.
If C is not supplied as a parameter, the model is assumed to be
deterministic (and C is set to a zero matrix of appropriate
291
In
Parameters
---------Q : array_like(float)
Q is the payoff(or cost) matrix that corresponds with the
control variable u and is `k x k`. Should be symmetric and
nonnegative definite
R : array_like(float)
R is the payoff(or cost) matrix that corresponds with the
state variable x and is `n x n`. Should be symetric and
non-negative definite
N : array_like(float)
N is the cross product term in the payoff, as above. It should
be `k x n`.
A : array_like(float)
A is part of the state transition as described above. It should
be `n x n`
B : array_like(float)
B is part of the state transition as described above. It should
be `n x k`
C : array_like(float), optional(default=None)
C is part of the state transition as described above and
corresponds to the random variable today. If the model is
deterministic then C should take default value of `None`
beta : scalar(float), optional(default=1)
beta is the discount parameter
T : scalar(int), optional(default=None)
T is the number of periods in a finite horizon problem.
Rf : array_like(float), optional(default=None)
Rf is the final (in a finite horizon model) payoff(or cost)
matrix that corresponds with the control variable u and is `n x
n`. Should be symetric and non-negative definite
Attributes
---------Q, R, N, A, B, C, beta, T, Rf : see Parameters
P : array_like(float)
P is part of the value function representation of V(x) = x'Px + d
d : array_like(float)
d is part of the value function representation of V(x) = x'Px + d
F : array_like(float)
292
It
.. math::
V_t(x) = x' P_t x + d_t
and the optimal policy :math:`F_t` one step *back* in time,
replacing the pair :math:`P_t` and :math:`d_t` with
293
Returns
------P : array_like(float)
P is part of the value function representation of
V(x) = xPx + d
F : array_like(float)
F is the policy rule that determines the choice of control
in each period.
d : array_like(float)
d is part of the value function representation of
V(x) = xPx + d
"""
# === simplify notation === #
Q, R, A, B, N, C = self.Q, self.R, self.A, self.B, self.N, self.C
# === solve Riccati equation, obtain P === #
A0, B0 = np.sqrt(self.beta) * A, np.sqrt(self.beta) * B
P = solve_discrete_riccati(A0, B0, R, Q, N)
# == Compute F == #
294
295
In the module, the various updating, simulation and fixed point methods are wrapped in a class
called LQ, which includes
Instance data:
The required parameters Q, R, A, B and optional parameters C, beta, T, R_f, N specifying
a given LQ model
* set T and R f to None in the infinite horizon case
* set C = None (or zero) in the deterministic case
the value function and policy data
* dt , Pt , Ft in the finite horizon case
* d, P, F in the infinite horizon case
Methods:
update_values shifts dt , Pt , Ft to their t 1 values via (2.73), (2.74) and (2.75)
stationary_values computes P, d, F in the infinite horizon case
compute_sequence - simulates the dynamics of xt , ut , wt given x0 and assuming standard normal shocks
An example of usage is given in lq_permanent_1.py from the main repository, the contents of
which are shown below
296
This program can be used to replicate the figures shown in our section on the permanent income
model
(Some of the plotting techniques are rather fancy and you can ignore those details if you wish)
import matplotlib.pyplot as plt
from quantecon import LQ
# == Model parameters == #
r
= 0.05
beta
= 1 / (1 + r)
T
= 45
c_bar
= 2
sigma
= 0.25
mu
= 1
q
= 1e6
# == Formulate as an LQ problem == #
Q = 1
R = np.zeros((2, 2))
Rf = np.zeros((2, 2))
Rf[0, 0] = q
A = [[1 + r, -c_bar + mu],
[0,
1]]
B = [[-1],
[0]]
C = [[sigma],
[0]]
# == Compute solutions and simulate == #
lq = LQ(Q, R, A, B, C, beta=beta, T=T, Rf=Rf)
x0 = (0, 1)
xp, up, wp = lq.compute_sequence(x0)
# == Convert back to assets, consumption and income == #
assets = xp[0, :]
# a_t
c = up.flatten() + c_bar
# c_t
income = wp[0, 1:] + mu
# y_t
# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for i in range(n_rows):
axes[i].grid()
axes[i].set_xlabel(r'Time')
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}
axes[0].plot(list(range(1, T+1)), income, 'g-', label="non-financial income",
**p_args)
axes[0].plot(list(range(T)), c, 'k-', label="consumption", **p_args)
297
Further Applications
T 1
)
(2.85)
t =0
subject to at+1 = (1 + r ) at ct + yt , t 0
For income we now take yt = p(t) + wt+1 where p(t) := m0 + m1 t + m2 t2
(In the next section we employ some tricks to implement a more sophisticated model)
The coefficients m0 , m1 , m2 are chosen such that p(0) = 0, p( T/2) = , and p( T ) = 0
You can confirm that the specification m0 = 0, m1 = T/( T/2)2 , m2 = /( T/2)2 satisfies these
constraints
To put this into an LQ setting, consider the budget constraint, which becomes
at+1 = (1 + r ) at ut c + m1 t + m2 t2 + wt+1
(2.86)
The fact that at+1 is a linear function of ( at , 1, t, t2 ) suggests taking these four variables as the state
vector xt
has been made, the remaining specifiOnce a good choice of state and control (recall ut = ct c)
cations fall into place relatively easily
298
at
1
xt :=
t ,
t2
1 + r c m1 m2
0
1
0
0
,
A :=
0
1
1
0
0
1
2
1
1
0
B :=
0 ,
0
C :=
0
0
(2.87)
If you expand the expression xt+1 = Axt + But + Cwt+1 using this specification, you will find that
assets follow (2.86) as desired, and that the other state variables also update appropriately
To implement preference specification (2.85) we take
Q := 1,
0
0
R :=
0
0
0
0
0
0
0
0
0
0
0
0
0
0
and
q
0
R f :=
0
0
0
0
0
0
0
0
0
0
0
0
0
0
(2.88)
The next figure shows a simulation of consumption and assets computed using the
compute_sequence method of lqcontrol.py with initial assets set to zero
299
(2.89)
Here
p(t) := m1 t + m2 t2 with the coefficients m1 , m2 chosen such that p(K ) = and p(0) =
p(2K ) = 0
s is retirement income
We suppose that preferences are unchanged and given by (2.77)
The budget constraint is also unchanged and given by at+1 = (1 + r ) at ct + yt
Our aim is to solve this problem and simulate paths using the LQ techniques described in this
lecture
In fact this is a nontrivial problem, as the kink in the dynamics (2.89) at K makes it very difficult
to express the law of motion as a fixed-coefficient linear system
However, we can still use our LQ methods here by suitably linking two component LQ problems
These two LQ problems describe the consumers behavior during her working life (lq_working)
and retirement (lq_retired)
(This is possible because in the two separate periods of life, the respective income processes [polynomial trend and constant] each fit the LQ framework)
The basic idea is that although the whole problem is not a single time-invariant LQ problem, it is
still a dynamic programming problem, and hence we can use appropriate Bellman equations at
every stage
Based on this logic, we can
1. solve lq_retired by the usual backwards induction procedure, iterating back to the start of
retirement
T HOMAS S ARGENT AND J OHN S TACHURSKI
300
2. take the start-of-retirement value function generated by this process, and use it as the terminal condition R f to feed into the lq_working specification
3. solve lq_working by backwards induction from this choice of R f , iterating back to the start
of working life
This process gives the entire life-time sequence of value functions and optimal policies
The next figure shows one simulation based on this procedure
The full set of parameters used in the simulation is discussed in Exercise 2, where you are asked to
replicate the figure
Once again, the dominant feature observable in the simulation is consumption smoothing
The asset path fits well with standard life cycle theory, with dissaving early in life followed by
later saving
Assets peak at retirement and subsequently decline
301
Application 3: Monopoly with Adjustment Costs Consider a monopolist facing stochastic inverse demand function
p t = a0 a1 q t + d t
Here qt is output, and the demand shock dt follows
dt+1 = dt + wt+1
where {wt } is iid and standard normal
The monopolist maximizes the expected discounted sum of present and future profits
(
)
E
t t
where
t := pt qt cqt (qt+1 qt )2
(2.90)
t =0
Here
(qt+1 qt )2 represents adjustment costs
c is average cost of production
This can be formulated as an LQ problem and then solved and simulated, but first lets study the
problem and try to get some intuition
One way to start thinking about the problem is to consider what would happen if = 0
Without adjustment costs there is no intertemporal trade-off, so the monopolist will choose output
to maximize current profit in each period
Its not difficult to show that profit-maximizing output is
qt :=
a0 c + d t
2a1
302
303
a1 (qt qt )2 + u2t
(2.91)
t =0
Its now relatively straightforward to find R and Q such that (2.91) can be written as (2.81)
Furthermore, the matrices A, B and C from (2.62) can be found by writing down the dynamics of
each element of the state
Exercise 3 asks you to complete this process, and reproduce the preceding figures
Exercises
304
For lq_working, preferences are the same, except that R f should be replaced by the final value
function that emerges from iterating lq_retired back to the start of retirement
With some careful footwork, the simulation can be generated by patching together the simulations
from these two separate models
Exercise 3 Reproduce the figures from the monopolist application given above
For parameters, use a0 = 5, a1 = 0.5, = 0.15, = 0.9, = 0.95 and c = 2, while varies between
1 and 50 (see figures)
Solutions
Solution notebook
305
We will also learn about how a rational expectations equilibrium can be characterized as a fixed
point of a mapping from a perceived law of motion to an actual law of motion
Equality between a perceived and an actual law of motion for endogenous market-wide objects
captures in a nutshell what the rational expectations equilibrium concept is all about
Finally, we will learn about the important Big K, little k trick, a modeling device widely used in
macroeconomics
Except that for us
Instead of Big K it will be Big Y
instead of little k it will be little y
The Big Y, little y trick This widely used method applies in contexts in which a representative
firm or agent is a price taker operating within a competitive equilibrium
We want to impose that
The representative firm or individual takes aggregate Y as given when it chooses individual
y, but . . .
At the end of the day, Y = y, so that the representative firm is indeed representative
The Big Y, little y trick accomplishes these two goals by
Taking Y as a given state variable or process, beyond the control of the representative
individual, when posing the problem of the individual firm or agent; but . . .
Imposing Y = y after having solved the individuals optimization problem
Please watch for how this strategy is applied as the lecture unfolds
We begin by applying the Big Y, little y trick in a very simple static context
A simple static example of the Big Y, little y trick Consider a static model in which a collection
of n firms produce a homogeneous good that is sold in a competitive market
Each of these n firms sells output y
The price p of the good lies on an inverse demand curve
p = a0 a1 Y
(2.92)
where
ai > 0 for i = 0, 1
Y = ny is the market-wide level of output
Each firm has total cost function
c(y) = c1 y + 0.5c2 y2 ,
ci > 0 for i = 1, 2
306
(2.93)
(2.94)
At this point, but not before, we substitute Y = ny into (2.94) to obtain the following linear equation
a 0 c 1 ( a 1 + n 1 c 2 )Y = 0
(2.95)
Our first illustration of rational expectations equilibrium involves a market with n firms, each of
whom seeks to maximize profits in the face of adjustment costs
The adjustment costs encourage the firms to make gradual adjustments, which in turn requires
consideration of future prices
Individual firms understand that prices are determined by aggregate supply from other firms, and
hence each firm must forecast this quantity
In our context, a forecast is expressed as a belief about the law of motion for the aggregate state
Rational expectations equilibrium is obtained when this belief coincides with the actual law of
motion generated by production choices made on the basis of this belief
307
(2.96)
where
ai > 0 for i = 0, 1
Yt = nyt is the market-wide level of output
The Firms Problem The firm is a price taker
While it faces no uncertainty, it does face adjustment costs
In particular, it chooses a production plan to maximize
t rt
(2.97)
t =0
where
rt := pt yt
( y t +1 y t )2
,
2
y0 given
(2.98)
308
The Firms Beliefs We suppose the firm believes that market-wide output Yt follows the law of
motion
Yt+1 = H (Yt )
(2.99)
where Y0 is a known initial condition
The belief function H is an equilibrium object, and hence remains to be determined
Optimal Behavior Given Beliefs For now lets fix a particular belief H in (2.99) and investigate
the firms response
Let v be the corresponding value function for the firms problem
The value function satisfies the Bellman equation
( y 0 y )2
0
v(y, Y ) = max
a0 y a1 yY
+ v(y , H (Y ))
2
y0
(2.100)
( y 0 y )2
+ v(y0 , H (Y ))
h(y, Y ) := arg max a0 y a1 yY
2
0
y
(2.101)
(2.102)
(y0 y) + vy (y0 , H (Y )) = 0
(2.103)
A well-known envelope result [BS79] implies that to differentiate v with respect to y we can
naively differentiate the right-hand side of (2.100), giving
vy (y, Y ) = a0 a1 Y + (y0 y)
Substituting this equation into (2.103) gives the Euler equation
(2.104)
In the process of solving its Bellman equation, the firm sets an output path that satisfies (2.104),
taking (2.99) as given, and subject to
the initial conditions for (y0 , Y0 )
the terminal condition limt t yt vy (yt , Yt ) = 0
This last condition is called the transversality condition, and acts as a first-order necessary condition
at infinity
The firms decision rule solves the difference equation (2.104) subject to the given initial condition
y0 and the transversality condition
Note that solving the Bellman equation (2.100) for v and then h in (2.102) yields a decision rule
that automatically imposes both the Euler equation (2.104) and the transversality condition
T HOMAS S ARGENT AND J OHN S TACHURSKI
309
Recalling that Yt = nyt , the actual law of motion for market-wide output is then
Yt+1 = nh(Yt /n, Yt )
(2.105)
Thus, when firms believe that the law of motion for market-wide output is (2.99), their optimizing
behavior makes the actual law of motion be (2.105)
Definition of Rational Expectations Equilibrium A rational expectations equilibrium or recursive
competitive equilibrium of the model with adjustment costs is a decision rule h and an aggregate
law of motion H such that
1. Given belief H, the map h is the firms optimal policy function
2. The law of motion H satisfies H (Y ) = nh(Y/n, Y ) for all Y
Thus, a rational expectations equilibrium equates the perceived and actual laws of motion (2.99)
and (2.105)
Fixed point characterization As weve seen, the firms optimum problem induces a mapping
from a perceived law of motion H for market-wide output to an actual law of motion ( H )
The mapping is the composition of two operations, taking a perceived law of motion into a
decision rule via (2.100)(2.102), and a decision rule into an actual law via (2.105)
The H component of a rational expectations equilibrium is a fixed point of
Computation of the Equilibrium
Now lets consider the problem of computing the rational expectations equilibrium
Misbehavior of Readers accustomed to dynamic programming arguments might try to address this problem by choosing some guess H0 for the aggregate law of motion and then iterating
with
Unfortunately, the mapping is not a contraction
In particular, there is no guarantee that direct iterations on converge 7
Fortunately, there is another method that works here
The method exploits a general connection between equilibrium and Pareto optimality expressed
in the fundamental theorems of welfare economics (see, e.g, [MCWG95])
7 A literature that studies whether models populated with agents who learn can converge to rational expectations
equilibria features iterations on a modification of the mapping that can be approximated as + (1 ) I. Here I
is the identity operator and (0, 1) is a relaxation parameter. See [MS89] and [EH01] for statements and applications
of this approach to establish conditions under which collections of adaptive agents who use least squares learning
converge to a rational expectations equilibrium.
310
Lucas and Prescott [LP71] used this method to construct a rational expectations equilibrium
The details follow
A Planning Problem Approach Our plan of attack is to match the Euler equations of the market
problem with those for a a single-agent planning problem
As well see, this planning problem can be solved by LQ control
The optimal quantities from the planning problem are then rational expectations equilibrium
quantities
The rational expectations equilibrium price can be obtained as a shadow price in the planning
problem
For convenience, in this section we set n = 1
We first compute a sum of consumer and producer surplus at time t
s(Yt , Yt+1 ) :=
Z Yt
0
( a0 a1 x ) dx
(Yt+1 Yt )2
2
(2.106)
The first term is the area under the demand curve, while the second is the social costs of changing
output
The planning problem is to choose a production plan {Yt } to maximize
t s(Yt , Yt+1 )
t =0
Y
+ V (Y 0 )
0
2
2
Y0
(2.107)
(Y 0 Y ) + V 0 (Y 0 ) = 0
(2.108)
(2.109)
311
The Key Insight Return to equation (2.104) and set yt = Yt for all t
(Recall that for this section weve set n = 1 to simplify the calculations)
A small amount of algebra will convince you that when yt = Yt , equations (2.109) and (2.104) are
identical
Thus, the Euler equation for the planning problem matches the second-order difference equation
that we derived by
1. finding the Euler equation of the representative firm and
2. substituting into it the expression Yt = nyt that makes the representative firm be representative
If it is appropriate to apply the same terminal conditions for these two difference equations, which
it is, then we have verified that a solution of the planning problem also is a rational expectations
equilibrium
It follows that for this example we can compute an equilibrium by forming the optimal linear
regulator problem corresponding to the Bellman equation (2.107)
The optimal policy function for the planning problem is the aggregate law of motion H that the
representative firm faces within a rational expectations equilibrium.
Structure of the Law of Motion As you are asked to show in the exercises, the fact that the
planners problem is an LQ problem implies an optimal policy and hence aggregate law of
motion taking the form
Yt+1 = 0 + 1 Yt
(2.110)
for some parameter pair 0 , 1
Now that we know the aggregate law of motion is linear, we can see from the firms Bellman
equation (2.100) that the firms problem can be framed as an LQ problem
As youre asked to show in the exercises, the LQ formulation of the firms problem implies a law
of motion that looks as follows
yt+1 = h0 + h1 yt + h2 Yt
(2.111)
Hence a rational expectations equilibrium will be defined by the parameters (0 , 1 , h0 , h1 , h2 ) in
(2.110)(2.111)
Exercises
312
Express the solution of the firms problem in the form (2.111) and give the values for each h j
If there were n identical competitive firms all behaving according to (2.111), what would (2.111)
imply for the actual law of motion (2.99) for market supply
Exercise 2 Consider the following 0 , 1 pairs as candidates for the aggregate law of motion
component of a rational expectations equilibrium (see (2.110))
Extending the program that you wrote for exercise 1, determine which if any satisfy the definition
of a rational expectations equilibrium
(94.0886298678, 0.923409232937)
(93.2119845412, 0.984323478873)
(95.0818452486, 0.952459076301)
Describe an iterative algorithm that uses the program that you wrote for exercise 1 to compute a
rational expectations equilibrium
(You are not being asked actually to use the algorithm you are suggesting)
Exercise 3 Recall the planners problem described above
1. Formulate the planners problem as an LQ problem
2. Solve it using the same parameter values in exercise 1
a0 = 100, a1 = 0.05, = 0.95, = 10
3. Represent the solution in the form Yt+1 = 0 + 1 Yt
4. Compare your answer with the results from exercise 2
Exercise 4 A monopolist faces the industry demand curve (2.96) and chooses {Yt } to maximize
t
t=0 rt where
(Yt+1 Yt )2
rt = pt Yt
2
Formulate this problem as an LQ problem
Compute the optimal policy using the same parameters as the previous exercise
In particular, solve for the parameters in
Yt+1 = m0 + m1 Yt
Compare your results with the previous exercise. Comment.
Solutions
Solution notebook
313
314
Example: A duopoly model Two firms are the only producers of a good the demand for which
is governed by a linear inverse demand function
p = a0 a1 ( q1 + q2 )
(2.112)
Here p = pt is the price of the good, qi = qit is the output of firm i = 1, 2 at time t and a0 > 0, a1 > 0
In (2.112) and what follows,
the time subscript is suppressed when possible to simplify notation
x denotes the next period value of variable x
Each firm recognizes that its output affects total output and therefore influences the market price
The one-period payoff function of firm i is price times quantity minus adjustment costs:
i = pqi (qi qi )2 ,
> 0,
(2.113)
Substituting the inverse demand curve (2.112) into (2.113) lets us express the one-period payoff as
i (qi , qi , qi ) = a0 qi a1 q2i a1 qi qi (qi qi )2 ,
(2.114)
Firm i chooses a decision rule that sets next period quantity qi as a function f i of the current state
( qi , q i )
An essential aspect of a Markov perfect equilibrium is that each firm takes the decision rule of the
other firm as known and given
Given f i , the Bellman equation of firm i is
vi (qi , qi ) = max {i (qi , qi , qi ) + vi (qi , f i (qi , qi ))}
qi
(2.115)
Definition A Markov perfect equilibrium for this game is a pair of value functions (v1 , v2 ) and a pair
of policy functions ( f 1 , f 2 ) such that, for each i {1, 2} and each possible state,
The value function vi satisfies the Bellman equation (2.115)
The maximizer on the right side of (2.115) is equal to f i (qi , qi )
The adjective Markov denotes that the equilibrium decision rules depend only on the current
values of the state variables, not other parts of their histories
Perfect means complete, in the sense that the equilibrium is constructed by backward induction
and hence builds in optimizing behavior for each firm for all possible future states
This includes many states that will not be realized when we iterate forward on the pair of equilibrium strategies f i
315
Computation One strategy for computing a Markov perfect equilibrium is iterating to convergence on pairs of Bellman equations and decision rules
j
In particular, let vi , f i be the value function and policy function for firm i at the j-th iteration
Imagine constructing the iterates
j +1
vi
n
o
j
(qi , qi ) = max i (qi , qi , qi ) + vi (qi , f i (qi , qi ))
(2.116)
qi
As we saw in the duopoly example, the study of Markov perfect equilibria in games with two
players leads us to an interrelated pair of Bellman equations
In linear quadratic dynamic games, these stacked Bellman equations become stacked Riccati
equations with a tractable mathematical structure
Well lay out that structure in a general setup and then apply it to some simple problems
A Coupled Linear Regulator Problem We consider a general linear quadratic regulator game
with two players
For convenience, well start with a finite horizon formulation, where t0 is the initial date and t1 is
the common terminal date
Player i takes {uit } as given and minimizes
t1 1
tt0 xt0 Ri xt + uit0 Qi uit + u0it Si uit + 2xt0 Wi uit + 2u0it Mi uit
(2.117)
t = t0
(2.118)
Here
xt is an n 1 state vector and uit is a k i 1 vector of controls for player i
Ri is n n
Si is k i k i
Qi is k i k i
Wi is n k i
Mi is k i k i
A is n n
Bi is n k i
T HOMAS S ARGENT AND J OHN S TACHURSKI
316
subject to
xt+1 = 1t xt + B1 u1t ,
(2.120)
where
it := A Bi Fit
it := Ri + F0 it Si Fit
it := Wi0 Mi0 Fit
This is an optimal linear regulator problem, which can be solved by working backwards
The policy rule that solves this problem is
F1t = ( Q1 + B10 P1t+1 B1 )1 ( B10 P1t+1 1t + 1t )
(2.121)
(2.123)
317
Infinite horizon We often want to compute the solutions of such games for infinite horizons, in
the hope that the decision rules Fit settle down to be time invariant as t1 +
In practice, we usually fix t1 and compute the equilibrium of an infinite horizon game by driving
t0
This is the approach we adopt in the next section
Implementation Below we display a function called nnash that computes a Markov perfect equilibrium of the infinite horizon linear quadratic dynamic game in the manner described above
from __future__ import division, print_function
import numpy as np
from numpy import dot, eye
from scipy.linalg import solve
def nnash(A, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2,
beta=1.0, tol=1e-8, max_iter=1000):
r"""
Compute the limit of a Nash linear quadratic dynamic game. In this
problem, player i minimizes
.. math::
\sum_{t=0}^{\infty}
\left\{
x_t' r_i x_t + 2 x_t' w_i
u_{it} +u_{it}' q_i u_{it} + u_{jt}' s_i u_{jt} + 2 u_{jt}'
m_i u_{it}
\right\}
subject to the law of motion
.. math::
x_{t+1} = A x_t + b_1 u_{1t} + b_2 u_{2t}
and a perceived control law :math:`u_j(t) = - f_j x_t` for the other
player.
The solution computed in this routine is the :math:`f_i` and
:math:`p_i` of the associated double optimal linear regulator
problem.
Parameters
---------A : scalar(float) or array_like(float)
Corresponds to the above equation, should be of size (n, n)
B1 : scalar(float) or array_like(float)
As above, size (n, k_1)
B2 : scalar(float) or array_like(float)
As above, size (n, k_2)
R1 : scalar(float) or array_like(float)
As above, size (n, n)
318
R2 : scalar(float) or array_like(float)
As above, size (n, n)
Q1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
Q2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
S1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
S2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
W1 : scalar(float) or array_like(float)
As above, size (n, k_1)
W2 : scalar(float) or array_like(float)
As above, size (n, k_2)
M1 : scalar(float) or array_like(float)
As above, size (k_2, k_1)
M2 : scalar(float) or array_like(float)
As above, size (k_1, k_2)
beta : scalar(float), optional(default=1.0)
Discount rate
tol : scalar(float), optional(default=1e-8)
This is the tolerance level for convergence
max_iter : scalar(int), optional(default=1000)
This is the maximum number of iteratiosn allowed
Returns
------F1 : array_like, dtype=float, shape=(k_1, n)
Feedback law for agent 1
F2 : array_like, dtype=float, shape=(k_2, n)
Feedback law for agent 2
P1 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 1
P2 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 2
"""
# == Unload parameters and make sure everything is an array == #
params = A, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2
params = map(np.asarray, params)
A, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2 = params
# == Multiply A, B1, B2 by sqrt(beta) to enforce discounting == #
A, B1, B2 = [np.sqrt(beta) * x for x in (A, B1, B2)]
n = A.shape[0]
if B1.ndim == 1:
k_1 = 1
B1 = np.reshape(B1, (n, 1))
else:
319
=
=
=
=
=
=
eye(k_1)
eye(k_2)
np.zeros((n, n))
np.zeros((n, n))
np.random.randn(k_1, n)
np.random.randn(k_2, n)
for it in range(max_iter):
# update
F10 = F1
F20 = F2
G2
G1
H2
H1
=
=
=
=
- \
F2.T.dot(M1), F1)
- \
F1.T.dot(M2), F2)
# success!
else:
msg = 'No convergence: Iteration limit of {0} reached in nnash'
raise ValueError(msg.format(max_iter))
return F1, F2, P1, P2
320
Lets use these procedures to treat some applications, starting with the duopoly model
The duopoly case To map the duopoly model into a coupled linear quadratic dynamic programming problem, define the state and controls as
1
where Q1 = Q2 = ,
0
R1 := a20
0
a20
a1
a1
2
a1
2
and
0
R2 : = 0
a20
0
0
a1
2
a20
a1
2
a1
0
0
1 0 0
A := 0 1 0 , B1 := 1 , B2 := 0
1
0
0 0 1
The optimal decision rule of firm i will take the form uit = Fi xt , inducing the following closed
loop system for the Markov perfect equilibrium:
xt+1 = ( A B1 F1 B1 F2 ) xt
(2.125)
Parameters and Solution Consider the previously presented duopoly model with parameter
values of:
a0 = 10
a1 = 2
= 0.96
= 12
From these we compute the infinite horizon MPE using the preceding code
"""
@authors: Chase Coleman, Thomas Sargent, John Stachurski
Markov Perfect Equilibrium for the simple duopoly example.
See the lecture at http://quant-econ.net/py/markov_perfect.html for a
description of the model.
321
= np.eye(3)
0.29512482
0.07584666
0.07584666]]
0.29512482]]
One way to see that Fi is indeed optimal for firm i taking F2 as given is to use QuantEcons LQ
class
322
In particular, lets take F2 as computed above, plug it into (2.119) and (2.120) to get firm 1s problem
and solve it using LQ
We hope that the resulting policy will agree with F1 as computed above
In [2]: Lambda1 = A - np.dot(B2, F2)
In [3]: lq1 = qe.LQ(Q1, R1, Lambda1, B1, beta=beta)
In [4]: P1_ih, F1_ih, d = lq1.stationary_values()
In [5]: F1_ih
Out[5]: array([[-0.66846611,
0.29512481,
0.07584666]])
This is close enough for rock and roll, as they say in the trade
Indeed, np.allclose agrees with our assessment
In [6]: np.allclose(F1, F1_ih)
Out[6]: True
Dynamics Lets now investigate the dynamics of price and output in this simple duopoly model
under the MPE policies
Given our optimal policies F1 and F2 the state evolves according to (2.125)
The following program
imports F1 and F2 from the previous program along with all parameters
computes the evolution of xt using (2.125)
extracts and plots industry output qt = q1t + q2t and price pt = a0 a1 qt
import matplotlib.pyplot as plt
from duopoly_mpe import *
AF = A - B1.dot(F1) - B2.dot(F2)
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n-1):
x[:, t+1] = np.dot(AF, x[:, t])
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2
# Total output, MPE
p = a0 - a1 * q
# Price, MPE
fig, ax = plt.subplots(figsize=(9, 5.8))
ax.plot(q, 'b-', lw=2, alpha=0.75, label='total output')
ax.plot(p, 'g-', lw=2, alpha=0.75, label='price')
ax.set_title('Output and prices, duopoly MPE')
ax.legend(frameon=False)
plt.show()
323
Note that the initial condition has been set to q10 = q20 = 1.0
The resulting figure looks as follows
To gain some perspective we can compare this to what happens in the monopoly case
The first panel in the next figure compares output of the monopolist and industry output under
the MPE, as a function of time
The second panel shows analogous curves for price
Here parameters are the same as above for both the MPE and monopoly solutions
The monopolist initial condition is q0 = 2.0 to mimic the industry initial condition q10 = q20 = 1.0
in the MPE case
As expected, output is higher and prices are lower under duopoly than monopoly
Exercises
Exercise 1 Replicate the pair of figures showing the comparison of output and prices for the monopolist and duopoly under MPE
Parameters are as in duopoly_mpe.py and you can use that code to compute MPE policies under
duopoly
The optimal policy in the monopolist case can be computed using QuantEcons LQ class
Exercise 2 In this exercise we consider a slightly more sophisticated duopoly problem
It takes the form of infinite horizon linear quadratic game proposed by [Judd1990]
Two firms set prices and quantities of two goods interrelated through their demand curves
Relevant variables are defined as follows:
T HOMAS S ARGENT AND J OHN S TACHURSKI
324
325
1
T
t =0
=
=
=
=
=
0.02
np.array([[-1, 0.5], [0.5, -1]])
np.array([25, 25])
np.array([1, -2, 1])
np.array([10, 10, 3])
326
Solutions
Solution notebook
327
Contents
Markov Asset Pricing
Overview
Pricing Models
Finite Markov Asset Pricing
Implementation
Exercises
Solutions
A little knowledge of geometric series goes a long way Robert E. Lucas, Jr.
Overview
We begin with some notation and then proceed to foundational pricing models
In what follows let d0 , d1 , . . . be a stream of dividends
A time-t cum-dividend asset is a claim to the stream dt , dt+1 , . . .
A time-t ex-dividend asset is a claim to the stream dt+1 , dt+2 , . . .
Risk Neutral Pricing Let = 1/(1 + ) be an intertemporal discount factor
In other words, is the rate at which agents discount the future
The basic risk-neutral asset pricing equation for pricing one unit of a cum-dividend asset is
p t = d t + E t [ p t +1 ]
(2.126)
328
Here E t [y] denotes the best forecast of y, conditioned on information available at time t
In the present case this information set consists of observations of dividends up until time t
For an ex-dividend asset (buy today in exchange for the asset and dividend tomorrow), the basic
risk-neutral asset pricing equation is
p t = E t [ d t +1 + p t +1 ]
(2.127)
Pricing Under Risk Aversion Lets now introduce risk aversion by supposing that all agents
evaluate payoffs according to strictly concave period utility function u
In this setting Robert Lucas [Luc78] showed that under certain equilibrium conditions the price of
an ex-dividend asset obeys the famous consumption-based asset pricing equation
0
u ( d t +1 )
(2.128)
pt = E t 0
( d t +1 + p t +1 )
u (dt )
Comparing (2.127) and (2.128), the difference is that in (2.127) has been replaced by
u 0 ( d t +1 )
u0 (dt )
1
with > 0
c 1
or
u(c) = ln c
= d + (d + pt+2 )
..
.
= d + d + 2 d + + k1 d + k pt+k
329
1
d
1
dt
t d0
=
1
1
(2.129)
(Hint: Set vt = pt /dt in (2.126) and then vt = vt+1 = v to solve for constant v)
The ex-dividend price is pt = (1 )1 dt
If, in this example, we take = 1 + g, then the ex-dividend price becomes
pt =
1+g
dt
g
where
This notation means that {t } is an n state Markov chain with transition matrix P and state space
s = { s1 , . . . , s n }
To obtain asset prices under risk neutrality, recall that in (2.129) the price dividend ratio pt /dt is
constant and depends on
This encourages us to guess that, in the current case, pt /dt is constant given t
That is pt = v(t )dt for some unknown function v on the state space
To simplify notation, let vi := v(si )
For a cum-dividend stock we find that vi = 1 + nj=1 Pij s j v j
Letting 1 be an n 1 vector of ones and Pij = Pij s j , we can express this in matrix notation as
v = ( I P )1 1
Here we are assuming invertibility, which requires that the growth rate of the Markov chain is not
too large relative to
T HOMAS S ARGENT AND J OHN S TACHURSKI
330
For the remainder of the lecture we focus on computing asset prices when
endowments follow a finite state Markov chain
agents are risk averse, and prices obey (2.128)
Our finite state Markov setting emulates [MP85]
In particular, well assume that there is an endowment of a consumption good that follows
c t +1 = t +1 c t
(2.130)
(2.131)
Drawing intuition from our earlier discussion on pricing with Markov growth, we guess a pricing
function of the form pt = v(t )ct where v is yet to be determined
331
vi = Pij s j
(1 + v j )
(2.132)
j =1
1
+ Pv
Assuming again that the eigenvalues of P are strictly less than 1 in modulus, we can solve this
to yield
v = ( I P )1 P1
(2.133)
With log preferences, = 1 and hence s1 = 1, from which we obtain
v=
1
1
Thus, with log preferences, the price-dividend ratio for a Lucas tree is constant
A Risk-Free Consol Consider the same pure exchange representative agent economy
A risk-free consol promises to pay a constant amount > 0 each period
Recycling notation, let pt now be the price of an ex-coupon claim to the consol
An ex-coupon claim to the consol entitles the owner at the end of period t to
in period t + 1, plus
the right to sell the claim for pt+1 next period
The price satisfies
u0 (ct ) pt = E t u0 (ct+1 )( + pt+1 )
h
i
p t = E t t +1 ( + p t +1 )
(2.134)
when
t = si
332
pi = Pij s j ( + p j )
+ Pp,
or
which can be expressed as p = P1
p = ( I P )1 P1
(2.135)
Pricing an Option to Purchase the Consol Lets now price options of varying maturity that give
the right to purchase a consol at a price pS
An infinite horizon call option We want to price an infinite horizon option to purchase a consol
at a price pS
The option entitles the owner at the beginning of a period either to
1. purchase the bond at price pS now, or
2. to hold the option until next period
Thus, the owner either exercises the option now, or chooses not to exercise and wait until next period
This is termed an infinite-horizon call option with strike price pS
The owner of the option is entitled to purchase the consol at the price pS at the beginning of any
period, after the coupon has been paid to the previous owner of the bond
The economy is identical with the one above
Let w(t , pS ) be the value of the option when the time t growth state is known to be t but before
the owner has decided whether or not to exercise the option at time t (i.e., today)
Recalling that p(t ) is the value of the consol when the initial growth state is t , the value of the
option satisfies
u 0 ( c t +1 )
w ( t +1 , p S ), p ( t ) p S
w(t , pS ) = max E t 0
u (ct )
The first term on the right is the value of waiting, while the second is the value of exercising
We can also write this as
(
w(si , pS ) = max
(2.136)
j =1
Letting Pij = Pij s j and wi = w(si , pS ), we can express (2.136) as the nonlinear vector equation
w = max{ Pw,
p pS 1}
(2.137)
To solve (2.137), form the operator T mapping vector w into vector Tw via
Tw = max{ Pw,
p pS 1}
Start at some initial w and iterate to convergence with T
T HOMAS S ARGENT AND J OHN S TACHURSKI
333
Finite-horizon options Finite horizon options obey functional equations closely related to
(2.136)
A k period option expires after k periods
At time t, a k period option gives the owner the right to exercise the option to purchase the risk-free
consol at the strike price pS at t, t + 1, . . . , t + k 1
The option expires at time t + k
Thus, for k = 1, 2, . . ., let w(si , k ) be the value of a k-period option
It obeys
(
w(si , k ) = max
p ( si ) p S
j =1
1
)
, pi pS , k = 1, 2, . . . with w0 = 0
wi = max Pij w j
j =1
m t +1 =
c t +1
ct
=
t +1
1
It follows that the reciprocal R
t of the gross risk-free interest rate Rt is
n
E t mt+1 = Pij sj
j =1
or
m1 = Ps
where the i-th element of m1 is the reciprocal of the one-period gross risk-free interest rate when
t = si
j period risk-free interest rates Let m j be an n 1 vector whose i th component is the reciprocal
of the j -period gross risk-free interest rate when t = si
and m j+1 = Pm
j for j 1
Then m1 = P,
334
The class AssetPrices from the QuantEcon package provides methods for computing some of the
prices described above
We print the code here for convenience
Exercises
Exercise 1 Compute the price of the Lucas tree in an economy with the following primitives
n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 - 0.0125 * np.ones(5))
s = np.array([1.05, 1.025, 1.0, 0.975, 0.95])
gamma = 2.0
beta = 0.94
zeta = 1.0
# state values
Using the same set of primitives, compute the price of the risk-free console when = 1
Do the same for the call option on the console when pS = 150.0
Compute the value of the option at dates T = [10,20,30]
Solutions
Solution notebook
Overview
This lecture describes a rational expectations version of the famous permanent income model of
Friedman [Fri56]
T HOMAS S ARGENT AND J OHN S TACHURSKI
335
In this section we state and solve the savings and consumption problem faced by the consumer
Preliminaries The discussion below requires a casual familiarity with martingales
A discrete time martingale is a stochastic process (i.e., a sequence of random variables) { Xt } with
finite mean and satisfying
E t [Xt+1 ] = Xt , t = 0, 1, 2, . . .
Martingales have the feature that the history of past outcomes provides no predictive power for
changes between current and future outcomes
For example, the current wealth of a gambler engaged in a fair game has this property
One common class of martingales is the family of random walks
A random walk is a stochastic process { Xt } that satisfies
X t +1 = X t + w t +1
for some iid zero mean innovation sequence {wt }
Evidently Xt can also be expressed as
t
Xt =
w j + X0
j =1
Not every martingale arises as a random walk (see, for example, Walds martingale)
336
The Decision Problem A consumer has preferences over consumption streams that are ordered
by the utility functional
"
#
E 0 t u(ct )
(2.138)
t =0
where
ct is time t consumption
u is a strictly concave one-period utility function
(0, 1) is a discount factor
The consumer maximizes (2.138) by choosing a consumption, borrowing plan {ct , bt+1 }
t=0 subject
to the sequence of budget constraints
bt+1 = (1 + r )(ct + bt yt )
t0
(2.139)
Here
yt is an exogenous endowment process
r > 0 is the risk-free interest rate
bt is one-period risk-free debt maturing at t
b0 is a given initial condition
Assumptions For the remainder of this lecture, we follow Friedman and Hall in assuming that
(1 + r ) 1 =
Regarding the endowment process, we assume it has the state-space representation
xt+1 = Axt + Cwt+1
yt = Uxt
(2.140)
(2.141)
where
{wt } is an iid vector process with E wt = 0 and E wt wt0 = I
the spectral radius of A satisfies ( A) < 1/
U is a selection vector that pins down yt as a particular linear combination of the elements
of xt .
The restriction on ( A) prevents income from growing so fast that certain sums become infinite
We also impose the no Ponzi scheme condition
"
E0
#
t bt2
<
(2.142)
t =0
This condition rules out an always-borrow scheme that would allow the household to enjoy unbounded or bliss consumption forever
T HOMAS S ARGENT AND J OHN S TACHURSKI
337
t = 0, 1, . . .
(2.143)
These equations are also known as the Euler equations for the model
If youre not sure where they come from, you can find a proof sketch in the appendix
With our quadratic preference specification, (2.143) has the striking implication that consumption
follows a martingale:
E t [ c t +1 ] = c t
(2.144)
(In fact quadratic preferences are necessary for this conclusion 8 )
One way to interpret (2.144) is that consumption will only change when new information about
permanent income is revealed
These ideas will be clarified below
The Optimal Decision Rule The state vector confronting the household at t is bt xt
Here
xt is an exogenous component, unaffected by household behavior
bt is an endogenous component (since it depends on the decision rule)
Note that xt contains all variables useful for forecasting the households future endowment
Now lets deduce the optimal decision rule 9
Note: One way to solve the consumers problem is to apply dynamic programming as in this lecture.
We do this later. But first we use an alternative approach that is revealing and shows the work
that dynamic programming does for us automatically
We want to solve the system of difference equations formed by (2.139) and (2.144) subject to the
boundary condition (2.142)
To accomplish this, observe first that (2.142) implies limt t bt+1 = 0
8
A linear marginal utility is essential for deriving (2.144) from (2.143). Suppose instead that we had imposed
the following more standard assumptions on the utility function: u0 (c) > 0, u00 (c) < 0, u000 (c) > 0 and required
that c 0. The Euler equation remains (2.143). But the fact that u000 < 0 implies via Jensens inequality that
0
0
t [ u ( ct+1 )] > u ( t [ ct+1 ]). This inequality together with (2.143) implies that
t [ ct+1 ] > ct (consumption is said
to be a submartingale), so that consumption stochastically diverges to +. The consumers savings also diverge to
+.
9 An optimal decision rule is a map from current state into current actionsin this case, consumption
338
Using this restriction on the debt path and solving (2.139) forward yields
bt =
j ( yt+ j ct+ j )
(2.145)
j =0
Take conditional expectations on both sides of (2.145) and use the law of iterated expectations to
deduce
ct
(2.146)
bt = j E t [ y t + j ]
1
j =0
Expressed in terms of ct we get
"
c t = (1 )
j E t [ y t + j ] bt
#
(2.147)
j =0
1
1+r ,
r
ct =
1+r
"
j E t [ y t + j ] bt
j =0
These last two equations assert that consumption equals economic income
financial wealth equals bt
j
non-financial wealth equals
j =0 E t [ y t + j ]
r
1+r
"
j E t [ yt+ j ] = E t j yt+ j
j =0
= U ( I A)1 xt
j =0
339
Using this expression, we can obtain a linear state-space system governing consumption, debt and
income:
xt+1 = Axt + Cwt+1
bt+1 = bt + U [( I A)
(2.148)
1
( A I )] xt
(2.149)
yt = Uxt
(2.150)
ct = (1 )[U ( I A)1 xt bt ]
(2.151)
A Simple Example with iid Income To gain some preliminary intuition on the implications of
(2.148), lets look at a highly stylized example where income is just iid
(Later examples will investigate more realistic income streams)
In particular, let {wt }
t=1 be iid and scalar standard normal, and let
1
0 0
x
xt = t , A =
, U= 1 , C=
0 1
0
1
Finally, let b0 = x01 = 0
Under these assumptions we have yt = + wt N (, 2 )
Further, if you work through the state space representation, you will see that
t 1
bt = w j
j =1
c t = + (1 ) w j
j =1
Thus income is iid and debt and consumption are both Gaussian random walks
Defining assets as bt , we see that assets are just the cumulative sum of unanticipated income
prior to the present date
The next figure shows a typical realization with r = 0.05, = 1 and = 0.15
Observe that consumption is considerably smoother than income
The figure below shows the consumption paths of 250 consumers with independent income
streams
The code for these figures can be found in perm_inc_figs.py
Alternative Representations
In this section we shed more light on the evolution of savings, debt and consumption by representing their dynamics in several different ways
340
341
Halls Representation Hall [Hal78] suggests a sharp way to summarize the implications of LQ
permanent income theory
First, to represent the solution for bt , shift (2.147) forward one period and eliminate bt+1 by using
(2.139) to obtain
h
i
c t + 1 = ( 1 ) j E t + 1 [ y t + j + 1 ] ( 1 ) 1 ( c t + bt y t )
j =0
j
If we add and subtract 1 (1 )
j=0 E t yt+ j from the right side of the preceding equation and
rearrange, we obtain
c t +1 c t = (1 ) j
j =0
E t +1 [ y t + j +1 ] E t [ y t + j +1 ]
(2.152)
The right side is the time t + 1 innovation to the expected present value of the endowment process
{yt }
We can represent the optimal decision rule for ct , bt+1 in the form of (2.152) and (2.146), which is
repeated here:
1
ct
(2.153)
bt = j E t [ y t + j ]
1
j =0
Equation (2.153) asserts that the households debt due at t equals the expected present value of its
endowment minus the expected present value of its consumption stream
A high debt thus indicates a large expected present value of surpluses yt ct
Recalling again our discussion on forecasting geometric sums, we have
E t j yt+j = U ( I A)1 xt
j =0
Using these formulas together with (2.140) and substituting into (2.152) and (2.153) gives the following representation for the consumers optimum decision rule:
ct+1 = ct + (1 )U ( I A)1 Cwt+1
1
ct
bt = U ( I A)1 xt
1
yt = Uxt
xt+1 = Axt + Cwt+1
(2.154)
(2.155)
(2.156)
(2.157)
342
(1 )bt + ct = (1 )U ( I A)1 xt
(2.158)
( 1 ) bt + c t = ( 1 ) E t j y t + j ,
(2.159)
j =0
Equation (2.159) asserts that the cointegrating residual on the left side equals the conditional expectation of the geometric sum of future incomes on the right 11
Cross-Sectional Implications Consider again (2.154), this time in light of our discussion of distribution dynamics in the lecture on linear systems
The dynamics of ct are given by
ct+1 = ct + (1 )U ( I A)1 Cwt+1
or
ct = c0 + w j
for
(2.160)
j =1
The unit root affecting ct causes the time t variance of ct to grow linearly with t
10
11
This would be the case if, for example, the spectral radius of A is strictly less than one
See Campbell and Shiller (1988) and Lettau and Ludvigson (2001, 2004) for interesting applications of related ideas.
343
(2.161)
2 := (1 )2 U ( I A)1 CC 0 ( I A0 )1 U 0
Assuming that > 0, this means that {ct } has no asymptotic distribution
Lets consider what this means for a cross-section of ex ante identical households born at time 0
Let the distribution of c0 represent the cross-section of initial consumption values
Equation (2.161) tells us that the distribution of ct spreads out over time at a rate proportional to t
A number of different studies have investigated this prediction (see, e.g., [DP94], [STY04])
Impulse Response Functions Impulse response functions measure the change in a dynamic system subject to a given impulse (i.e., temporary shock)
The impulse response function of {ct } to the innovation {wt } is a box
In particular, the response of ct+ j to a unit increase in the innovation wt+1 is (1 )U ( I A)1 C
for all j 1
Moving Average Representation Its useful to express the innovation to the expected present
value of the endowment process in terms of a moving average representation for income yt
The endowment process defined by (2.140) has the moving average representation
y t +1 = d ( L ) w t +1
(2.162)
where
j
12
d( L) =
j=0 d j L for some sequence d j , where L is the lag operator
y t + j E t [ y t + j ] = d 0 w t + j + d 1 w t + j 1 + + d j 1 w t +1
It follows that
E t +1 [ y t + j ] E t [ y t + j ] = d j 1 w t +1
(2.163)
c t +1 c t = (1 ) d ( ) w t +1
(2.164)
13
344
We illustrate some of the preceding ideas with the following two examples
In both examples, the endowment follows the process yt = x1t + x2t where
x1t+1
1 0 x1t
1 0 w1t+1
=
+
x2t+1
0 0 x2t
0 2 w2t+1
Here
wt+1 is an iid 2 1 process distributed as N (0, I )
x1t is a permanent component of yt
x2t is a purely transitory component
Example 1 Assume as before that the consumer observes the state xt at time t
In view of (2.154) we have
ct+1 ct = 1 w1t+1 + (1 )2 w2t+1
(2.165)
Formula (2.165) shows how an increment 1 w1t+1 to the permanent component of income x1t+1
leads to
a permanent one-for-one increase in consumption and
no increase in savings bt+1
But the purely transitory component of income 2 w2t+1 leads to a permanent increment in consumption by a fraction 1 of transitory income
The remaining fraction is saved, leading to a permanent increment in bt+1
Application of the formula for debt in (2.148) to this example shows that
bt+1 bt = x2t = 2 w2t
(2.166)
This confirms that none of 1 w1t is saved, while all of 2 w2t is saved
The next figure illustrates these very different reactions to transitory and permanent income
shocks using impulse-response functions
The code for generating this figure is in file examples/perm_inc_ir.py from the main repository,
as shown below
"""
Impulse response functions for the LQ permanent income model permanent and
transitory shocks.
"""
import numpy as np
import matplotlib.pyplot as plt
r
beta
T
S
sigma1
=
=
=
=
=
345
0.05
1 / (1 + r)
20 # Time horizon
5
# Impulse date
sigma2 = 0.15
def time_path(permanent=False):
"Time path of consumption and debt given shock sequence"
w1 = np.zeros(T+1)
w2 = np.zeros(T+1)
b = np.zeros(T+1)
c = np.zeros(T+1)
if permanent:
w1[S+1] = 1.0
else:
w2[S+1] = 1.0
for t in range(1, T):
b[t+1] = b[t] - sigma2 * w2[t]
c[t+1] = c[t] + sigma1 * w1[t+1] + (1 - beta) * sigma2 * w2[t+1]
return b, c
fig, axes = plt.subplots(2, 1)
plt.subplots_adjust(hspace=0.5)
p_args = {'lw': 2, 'alpha': 0.7}
346
L = 0.175
for ax in axes:
ax.grid(alpha=0.5)
ax.set_xlabel(r'Time')
ax.set_ylim(-L, L)
ax.plot((S, S), (-L, L), 'k-', lw=0.5)
ax = axes[0]
b, c = time_path(permanent=0)
ax.set_title('impulse-response, transitory income shock')
ax.plot(list(range(T+1)), c, 'g-', label="consumption", **p_args)
ax.plot(list(range(T+1)), b, 'b-', label="debt", **p_args)
ax.legend(loc='upper right')
ax = axes[1]
b, c = time_path(permanent=1)
ax.set_title('impulse-response, permanent income shock')
ax.plot(list(range(T+1)), c, 'g-', label="consumption", **p_args)
ax.plot(list(range(T+1)), b, 'b-', label="debt", **p_args)
ax.legend(loc='lower right')
plt.show()
Example 2 Assume now that at time t the consumer observes yt , and its history up to t, but not
xt
Under this assumption, it is appropriate to use an innovation representation to form A, C, U in (2.154)
The discussion in sections 2.9.1 and 2.11.3 of [LS12] shows that the pertinent state space representation for yt is
y t +1
1 (1 K ) yt
1
=
+
a
a t +1
0
0
at
1 t +1
yt
yt = 1 0
at
where
K := the stationary Kalman gain
a t : = y t E [ y t | y t 1 , . . . , y 0 ]
In the same discussion in [LS12] it is shown that K [0, 1] and that K increases as 1 /2 does
In other words, as the ratio of the standard deviation of the permanent shock to that of the transitory shock increases
Applying formulas (2.154) implies
ct+1 ct = [1 (1 K )] at+1
(2.167)
347
where the endowment process can now be represented in terms of the univariate innovation to yt
as
y t +1 y t = a t +1 (1 K ) a t
(2.168)
Equation (2.168) indicates that the consumer regards
fraction K of an innovation at+1 to yt+1 as permanent
fraction 1 K as purely transitory
The consumer permanently increases his consumption by the full amount of his estimate of the
permanent part of at+1 , but by only (1 ) times his estimate of the purely transitory part of at+1
Therefore, in total he permanently increments his consumption by a fraction K + (1 )(1 K ) =
1 (1 K ) of at+1
He saves the remaining fraction (1 K )
According to equation (2.168), the first difference of income is a first-order moving average
Equation (2.167) asserts that the first difference of consumption is iid
Application of formula to this example shows that
bt + 1 bt = ( K 1 ) a t
(2.169)
This indicates how the fraction K of the innovation to yt that is regarded as permanent influences
the fraction of the innovation that is saved
Further Reading
The model described above significantly changed how economists think about consumption
At the same time, its generally recognized that Halls version of the permanent income hypothesis
fails to capture all aspects of the consumption/savings data
For example, liquidity constraints and buffer stock savings appear to be important
Further discussion can be found in, e.g., [HM82], [Par99], [Dea91], [Car01]
Appendix: The Euler Equation
b1
b0 + y0
1+r
and
c1 = y1 b1
348
Subsituting these constraints into our two period objective u(c0 ) + E 0 [u(c1 )] gives
b1
b0 + y0 + E 0 [u(y1 b1 )]
max u
R
b1
You will be able to verify that the first order condition is
u0 (c0 ) = R E 0 [u0 (c1 )]
Using R = 1 gives (2.143) in the two period case
The proof for the general case is not dissimilar
CHAPTER
THREE
ADVANCED APPLICATIONS
This advanced section of the course contains more complex applications, and can be read selectively, according to your interests
Overview
In a previous lecture we learned about finite Markov chains, a relatively elementary class of stochastic dynamic models
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov chains
Most stochastic dynamic models studied by economists either fit directly into this class or can be
represented as continuous state Markov chains after minor modifications
In this lecture, our focus will be on continuous Markov models that
evolve in discrete time
are often nonlinear
The fact that we accommodate nonlinear models here is significant, because linear stochastic models have their own highly developed tool set, as well see later on
349
350
The question that interests us most is: Given a particular stochastic dynamic model, how will the
state of the system evolve over time?
In particular,
What happens to the distribution of the state variables?
Is there anything we can say about the average behavior of these variables?
Is there a notion of steady state or long run equilibrium thats applicable to the model?
If so, how can we compute it?
Answering these questions will lead us to revisit many of the topics that occupied us in the finite
state case, such as simulation, distribution dynamics, stability, ergodicity, etc.
Note: For some people, the term Markov chain always refers to a process with a finite or
discrete state space. We follow the mainstream mathematical literature (e.g., [MT09]) in using the
term to refer to any discrete time Markov process
You are probably aware that some distributions can be represented by densities and some cannot
(For example, distributions on the real numbers R that put positive probability on individual
points have no density representation)
We are going to start our analysis by looking at Markov chains where the one step transition
probabilities have density representations
The benefit is that the density case offers a very direct parallel to the finite case in terms of notation
and intuition
Once weve built some intuition well cover the general case
Definitions and Basic Properties In our lecture on finite Markov chains, we studied discrete time
Markov chains that evolve on a finite state space S
In this setting, the dynamics of the model are described by a stochastic matrix a nonnegative
square matrix P = P[i, j] such that each row P[i, ] sums to one
The interpretation of P is that P[i, j] represents the probability of transitioning from state i to state
j in one unit of time
In symbols,
P{ Xt+1 = j | Xt = i } = P[i, j]
Equivalently,
P can be thought of as a family of distributions P[i, ], one for each i S
P[i, ] is the distribution of Xt+1 given Xt = i
351
(As you probably recall, when using NumPy arrays, P[i, ] is expressed as P[i,:])
In this section, well allow S to be a subset of R, such as
R itself
the positive reals (0, )
a bounded interval ( a, b)
The family of discrete distributions P[i, ] will be replaced by a family of densities p( x, ), one for
each x S
Analogous to the finite state case, p( x, ) is to be understood as the distribution (density) of Xt+1
given Xt = x
More formally, a stochastic kernel on S is a function p : S S R with the property that
1. p( x, y) 0 for all x, y S
R
2. p( x, y)dy = 1 for all x S
(Integrals are over the whole space unless otherwise specified)
For example, let S = R and consider the particular stochastic kernel pw defined by
1
( y x )2
pw ( x, y) :=
exp
2
2
(3.1)
where
IID
{ t } N (0, 1)
(3.2)
(3.3)
352
t2 = + Xt2 ,
, > 0
(3.4)
This is a special case of (3.3) with ( x ) = x and ( x ) = ( + x2 )1/2 Example 3: With stochastic
production and a constant savings rate, the one-sector neoclassical growth model leads to a law
of motion for capital per worker such as
k t+1 = sAt+1 f (k t ) + (1 )k t
(3.5)
Here
s is the rate of savings
At+1 is a production shock
The t + 1 subscript indicates that At+1 is not visible at time t
is a depreciation rate
f : R+ R+ is a production function satisfying f (k ) > 0 whenever k > 0
(The fixed savings rate can be rationalized as the optimal policy for a particular set of technologies
and preferences (see [LS12], section 3.1.2), although we omit the details here)
Equation (3.5) is a special case of (3.3) with ( x ) = (1 ) x and ( x ) = s f ( x )
Now lets obtain the stochastic kernel corresponding to the generic model (3.3)
To find it, note first that if U is a random variable with density f U , and V = a + bU for some
constants a, b with b > 0, then the density of V is given by
1
va
f V (v) = f U
(3.6)
b
b
(The proof is below. For a multidimensional version see EDTC, theorem 8.1.3)
Taking (3.6) as given for the moment, we can obtain the stochastic kernel p for (3.3) by recalling
that p( x, ) is the conditional density of Xt+1 given Xt = x
In the present case, this is equivalent to stating that p( x, ) is the density of Y := ( x ) + ( x ) t+1
when t+1
Hence, by (3.6),
1
p( x, y) =
( x)
y ( x )
( x)
(3.7)
353
p( x, y) =
s f (x)
s f (x)
(3.8)
i S
This intuitive equality states that the probability of being at j tomorrow is the probability of visiting i today and then going on to j, summed over all possible i
In the density case, we just replace the sum with an integral and probability mass functions with
densities, yielding
Z
t+1 (y) =
p( x, y)t ( x ) dx,
y S
(3.9)
(P)(y) =
p( x, y)( x )dx
(3.10)
(3.11)
Equation (3.11) tells us that if we specify a distribution for 0 , then the entire sequence of future
distributions can be obtained by iterating with P
T HOMAS S ARGENT AND J OHN S TACHURSKI
354
Computation To learn about the dynamics of a given process, its useful to compute and study
the sequences of densities generated by the model
One way to do this is to try to implement the iteration described by (3.10) and (3.11) using numerical integration
However, to produce P from via (3.10), you would need to integrate at every y, and there is a
continuum of such y
Another possibility is to discretize the model, but this introduces errors of unknown size
A nicer alternative in the present setting is to combine simulation with an elegant estimator called
the look ahead estimator
Lets go over the ideas with reference to the growth model discussed above, the dynamics of which
we repeat here for convenience:
k t+1 = sAt+1 f (k t ) + (1 )k t
(3.12)
Our aim is to compute the sequence {t } associated with this model and fixed initial condition 0
To approximate t by simulation, recall that, by definition, t is the density of k t given k0 0
If we wish to generate observations of this random variable, all we need to do is
1. draw k0 from the specified initial condition 0
2. draw the shocks A1 , . . . , At from their specified density
3. compute k t iteratively via (3.12)
If we repeat this n times, we get n independent observations k1t , . . . , knt
With these draws in hand, the next step is to generate some kind of representation of their distribution t
A naive approach would be to use a histogram, or perhaps a smoothed histogram using SciPys
gaussian_kde function
However, in the present setting there is a much better way to do this, based on the look-ahead
estimator
With this estimator, to construct an estimate of t , we actually generate n observations of k t1 ,
rather than k t
355
Now we take these n observations k1t1 , . . . , knt1 and form the estimate
tn (y) =
1 n
p(kit1 , y)
n i
=1
(3.13)
p( x, y)t1 ( x ) dx = t (y)
356
Given our use of the __call__ method, an instance of LAE acts as a callable object, which is essentially a function that can store its own data (see this discussion)
T HOMAS S ARGENT AND J OHN S TACHURSKI
357
The figure shows part of the density sequence {t }, with each density computed via the look
ahead estimator
Notice that the sequence of densities shown in the figure seems to be converging more on this
in just a moment
Another quick comment is that each of these distributions could be interpreted as a cross sectional
distribution (recall this discussion)
358
Up until now, we have focused exclusively on continuous state Markov chains where all conditional distributions p( x, ) are densities
As discussed above, not all distributions can be represented as densities
If the conditional distribution of Xt+1 given Xt = x cannot be represented as a density for some
x S, then we need a slightly different theory
The ultimate option is to switch from densities to probability measures, but not all readers will be
familiar with measure theory
We can, however, construct a fairly general theory using distribution functions
Example and Definitions To illustrate the issues, recall that Hopenhayn and Rogerson [HR93]
study a model of firm dynamics where individual firm productivity follows the exogenous process
Xt+1 = a + Xt + t+1 ,
where
IID
{ t } N (0, 2 )
where
h ( x ) : = x 1 {0 x 1} + 1 { x > 1}
If you think about it, you will see that for any given x [0, 1], the conditional distribution of Xt+1
given Xt = x puts positive probability mass on 0 and 1
Hence it cannot be represented as a density
What we can do instead is use cumulative distribution functions (cdfs)
To this end, set
G ( x, y) := P{ h( a + x + t+1 ) y}
(0 x, y 1)
This family of cdfs G ( x, ) plays a role analogous to the stochastic kernel in the density case
The distribution dynamics in (3.9) are then replaced by
Ft+1 (y) =
G ( x, y) Ft (dx )
(3.14)
Here Ft and Ft+1 are cdfs representing the distribution of the current state and next period state
The intuition behind (3.14) is essentially the same as for (3.9)
Computation If you wish to compute these cdfs, you cannot use the look-ahead estimator as
before
Indeed, you should not use any density estimator, since the objects you are estimating/computing
are not densities
One good option is simulation as before, combined with the empirical distribution function
T HOMAS S ARGENT AND J OHN S TACHURSKI
359
In our lecture on finite Markov chains we also studied stationarity, stability and ergodicity
Here we will cover the same topics for the continuous case
We will, however, treat only the density case (as in this section), where the stochastic kernel is a
family of densities
The general case is relatively similar references are given below
Theoretical Results Analogous to the finite case, given a stochastic kernel p and corresponding
Markov operator as defined in (3.10), a density on S is called stationary for P if it is a fixed point
of the operator P
In other words,
(y) =
p( x, y) ( x ) dx,
y S
(3.15)
As with the finite case, if is stationary for P, and the distribution of X0 is , then, in view of
(3.11), Xt will have this same distribution for all t
Hence is the stochastic equivalent of a steady state
In the finite case, we learned that at least one stationary distribution exists, although there may be
many
When the state space is infinite, the situation is more complicated
Even existence can fail very easily
For example, the random walk model has no stationary density (see, e.g., EDTC, p. 210)
However, there are well-known conditions under which a stationary density exists
With additional conditions, we can also get a unique stationary density ( D and = P =
= ), and also global convergence in the sense that
D,
Pt
as
(3.16)
This combination of existence, uniqueness and global convergence in the sense of (3.16) is often
referred to as global stability
Under very similar conditions, we get ergodicity, which means that
1 n
h ( Xt )
n t
=1
h( x ) ( x )dx
as n
(3.17)
for any (measurable) function h : S R such that the right-hand side is finite
Note that the convergence in (3.17) does not depend on the distribution (or value) of X0
This is actually very important for simulation it means we can learn about (i.e., approximate
the right hand side of (3.17) via the left hand side) without requiring any special knowledge about
what to do with X0
360
So what are these conditions we require to get global stability and ergodicity?
In essence, it must be the case that
1. Probability mass does not drift off to the edges of the state space
2. Sufficient mixing obtains
For one such set of conditions see theorem 8.2.14 of EDTC
In addition
[SLP89] contains a classic (but slightly outdated) treatment of these topics
From the mathematical literature, [LM94] and [MT09] give outstanding in depth treatments
Section 8.1.2 of EDTC provides detailed intuition, and section 8.3 gives additional references
EDTC, section 11.3.4 provides a specific treatment for the growth model we considered in
this lecture
An Example of Stability As stated above, the growth model treated here is stable under mild conditions on the primitives
See EDTC, section 11.3.4 for more details
We can see this stability in action in particular, the convergence in (3.16) by simulating the
path of densities from various initial conditions
Here is such a figure
All sequences are converging towards the same limit, regardless of their initial condition
361
The details regarding initial conditions and so on are given in this exercise, where you are asked to
replicate the figure
Computing Stationary Densities In the preceding figure, each sequence of densities is converging towards the unique stationary density
Even from this figure we can get a fair idea what looks like, and where its mass is located
However, there is a much more direct way to estimate the stationary density, and it involves only
a slight modification of the look ahead estimator
Lets say that we have a model of the form (3.3) that is stable and ergodic
Let p be the corresponding stochastic kernel, as given in (3.7)
To approximate the stationary density , we can simply generate a long time series X0 , X1 , . . . , Xn
and estimate via
1 n
n (y) = p( Xt , y)
(3.18)
n t =1
This is essentially the same as the look ahead estimator (3.13), except that now the observations
we generate are a single time series, rather than a cross section
The justification for (3.18) is that, with probability one as n ,
1 n
p ( Xt , y )
n t
=1
p( x, y) ( x ) dx = (y)
where the convergence is by (3.17) and the equality on the right is by (3.15)
The right hand side is exactly what we want to compute
On top of this asymptotic result, it turns out that the rate of convergence for the look ahead estimator is very good
The first exercise helps illustrate this point
Exercises
where
IID
{ t } N (0, 1)
(3.19)
This is one of those rare nonlinear stochastic models where an analytical expression for the stationary density is available
In particular, provided that | | < 1, there is a unique stationary density given by
y
(y) = 2 (y)
(1 2 )1/2
(3.20)
Here is the standard normal density and is the standard normal cdf
362
As an exercise, compute the look ahead estimate of , as defined in (3.18), and compare it with
in (3.20) to see whether they are indeed close for large n
In doing so, set = 0.8 and n = 500
The next figure shows the result of such a computation
The additional density (black line) is a nonparametric kernel density estimate, added to the solution for illustration
(You can try to replicate it before looking at the solution if you want to)
As you can see, the look ahead estimator is a much tighter fit than the kernel density estimator
If you repeat the simulation you will see that this is consistently the case
Exercise 2 Replicate the figure on global convergence shown above
The densities come from the stochastic growth model treated at the start of the lecture
Begin with the code found in examples/stochasticgrowth.py
Use the same parameters
For the four initial distributions, use the shifted beta distributions
psi_0 = beta(5, 5, scale=0.5, loc=i*2)
for ``i in range(4)``
363
=
=
=
=
=
500
np.random.randn(n)
np.exp(x)
np.random.randn(n) + 2.0
np.random.randn(n) + 4.0
#
#
#
#
N(0, 1)
Map x to lognormal
N(2, 1)
N(4, 1)
Each data set is represented by a box, where the top and bottom of the box are the third and first
quartiles of the data, and the red line in the center is the median
The boxes give some indication as to
the location of probability mass for each sample
whether the distribution is right-skewed (as is the lognormal distribution), etc
Now lets put these ideas to use in a simulation
Consider the threshold autoregressive model in (3.19)
T HOMAS S ARGENT AND J OHN S TACHURSKI
364
Solution notebook
Appendix
Overview
365
The elegant asset pricing model of Lucas [Luc78] attempts to answer this question in an equilibrium setting with risk averse agents
While we mentioned some consequences of Lucas model earlier, it is now time to work through
the model more carefully, and try to understand where the fundamental asset pricing equation
comes from
A side benefit of studying Lucas model is that it provides a beautiful illustration of model building in general and equilibrium pricing in competitive models in particular
The Lucas Model
Lucas studied a pure exchange economy with a representative consumer (or household), where
Pure exchange means that all endowments are exogenous
Representative consumer means that either
there is a single consumer (sometimes also referred to as a household), or
all consumers have identical endowments and preferences
Either way, the assumption of a representative agent means that prices adjust to eradicate desires
to trade
This makes it very easy to compute competitive equilibrium prices
Basic Setup Lets review the set up
Assets There is a single productive unit that costlessly generates a sequence of consumption
goods {yt }
t =0
Another way to view {yt }
t=0 is as a consumption endowment for this economy
We will assume that this endowment is Markovian, following the exogenous process
y t +1 = G ( y t , t +1 )
Here { t } is an iid shock sequence with known distribution and yt 0
An asset is a claim on all or part of this endowment stream
The consumption goods {yt }
t=0 are nonstorable, so holding assets is the only way to transfer
wealth into the future
For the purposes of intuition, its common to think of the productive unit as a tree that produces
fruit
Based on this idea, a Lucas tree is a claim on the consumption endowment
366
Consumers A representative consumer ranks consumption streams {ct } according to the time
separable utility functional
E t u(ct )
(3.21)
t =0
Here
(0, 1) is a fixed discount factor
u is a strictly increasing, strictly concave, continuously differentiable period utility function
E is a mathematical expectation
Pricing a Lucas Tree What is an appropriate price for a claim on the consumption endowment?
Well price an ex dividend claim, meaning that
the seller retains this periods dividend
the buyer pays pt today to purchase a claim on
yt+1 and
the right to sell the claim tomorrow at price pt+1
Since this is a competitive model, the first step is to pin down consumer behavior, taking prices as
given
Next well impose equilibrium constraints and try to back out prices
In the consumer problem, the consumers control variable is the share t of the claim held in each
period
Thus, the consumer problem is to maximize (3.21) subject to
c t + t +1 p t t y t + t p t
along with ct 0 and 0 t 1 at each t
The decision to hold share t is actually made at time t 1
But this value is inherited as a state variable at time t, which explains the choice of subscript
The dynamic program We can write the consumer problem as a dynamic programming problem
Our first observation is that prices depend on current information, and current information is
really just the endowment process up until the current period
In fact the endowment process is Markovian, so that the only relevant information is the current
state y R+ (dropping the time subscript)
This leads us to guess an equilibrium where price is a function p of y
Remarks on the solution method
Since this is a competitive (read: price taking) model, the consumer will take this function p
as given
T HOMAS S ARGENT AND J OHN S TACHURSKI
367
In this way we determine consumer behavior given p and then use equilibrium conditions
to recover p
This is the standard way to solve competitive equilibrum models
Using the assumption that price is a given function p of y, we write the value function and constraint as
Z
v(, y) = max
u(c) +
0
c,
subject to
v( 0 , G (y, z))(dz)
c + 0 p(y) y + p(y)
(3.22)
We can invoke the fact that utility is increasing to claim equality in (3.22) and hence eliminate the
constraint, obtaining
Z
0
0
v(, y) = max
u
[
(
y
+
p
(
y
))
p
(
y
)]
+
v
(
,
G
(
y,
z
))
(
dz
)
(3.23)
0
The solution to this dynamic programming problem is an optimal policy expressing either 0 or c
as a function of the state (, y)
Each one determines the other, since c(, y) = (y + p(y)) 0 (, y) p(y)
Next steps What we need to do now is determine equilibrium prices
It seems that to obtain these, we will have to
1. Solve this two dimensional dynamic programming problem for the optimal policy
2. Impose equilibrium constraints
3. Solve out for the price function p(y) directly
However, as Lucas showed, there is a related but more straightforward way to do this
Equilibrium constraints Since the consumption good is not storable, in equilibrium we must
have ct = yt for all t
In addition, since there is one representative consumer (alternatively, since all consumers are identical), there should be no trade in equilibrium
In particular, the representative consumer owns the whole tree in every period, so t = 1 for all t
Prices must adjust to satisfy these two constraints
The equilibrium price function Now observe that the first order condition for (3.23) can be
written as
Z
u0 (c) p(y) = v10 ( 0 , G (y, z))(dz)
where v10 is the derivative of v with respect to its first argument
To obtain v10 we can simply differentiate the right hand side of (3.23) with respect to , yielding
v10 (, y) = u0 (c)(y + p(y))
T HOMAS S ARGENT AND J OHN S TACHURSKI
368
Next we impose the equilibrium constraints while combining the last two equations to get
p(y) =
u0 [ G (y, z)]
[ G (y, z) + p( G (y, z))](dz)
u0 (y)
(3.24)
(3.25)
f [ G (y, z)](dz)
(3.27)
u0 [ G (y, z)] G (y, z)(dz) is a function that depends only on the primitives
( T f )(y) = h(y) +
f [ G (y, z)](dz)
(3.28)
The reason we do this is that a solution to (3.27) now corresponds to a function f satisfying
( T f )(y) = f (y) for all y
In other words, a solution is a fixed point of T
This means that we can use fixed point theory to obtain and compute the solution
A little fixed point theory Let cbR+ be the set of continuous bounded functions f :
R+ R+
369
(Note: If you find the mathematics heavy going you can take 12 as given and skip to the next
section)
Recall the Banach contraction mapping theorem
It tells us that the previous statements will be true if we can find an < 1 such that
k T f Tgk k f gk,
f , g cbR+
(3.29)
Observe that, since integrals get larger when absolute values are moved to the inside,
Z
Z
| T f (y) Tg(y)| = f [ G (y, z)](dz) g[ G (y, z)](dz)
Z
Z
= k f gk
Since the right hand side is an upper bound, taking the sup over all y on the left hand side gives
(3.29) with :=
Computation An Example The preceding discussion tells that we can compute f by picking
any arbitrary f cbR+ and then iterating with T
The equilibrium price function p can then be recovered by p (y) = f (y)/u0 (y)
Lets try this when ln yt+1 = ln yt + et+1 where {et } is iid and standard normal
Utility will take the isoelastic form u(c) = c1 /(1 ), where > 0 is the coefficient of relative
risk aversion
Some code to implement the iterative computational procedure can be found in lucastree.py from
the QuantEcon package
We repeat it here for convenience
r"""
Filename: lucastree.py
Authors: Thomas Sargent, John Stachurski, Spencer Lyon
Solves the price function for the Lucas tree in a continuous state
setting, using piecewise linear approximation for the sequence of
candidate price functions. The consumption endownment follows the log
linear AR(1) process
.. math::
log y' = \alpha log y + \sigma \epsilon
370
where y' is a next period y and epsilon is an iid standard normal shock.
Hence
.. math::
y' = y^{\alpha} * \xi,
where
.. math::
\xi = e^(\sigma * \epsilon)
The distribution phi of xi is
.. math::
\phi = LN(0, \sigma^2),
where LN means lognormal.
"""
from __future__ import division # == Omit for Python 3.x == #
import numpy as np
from scipy import interp
from scipy.stats import lognorm
from scipy.integrate import fixed_quad
from ..compute_fp import compute_fixed_point
class LucasTree(object):
"""
Class to solve for the price of a the Lucas tree in the Lucas
asset pricing model
Parameters
---------gamma : scalar(float)
The coefficient of risk aversion in the household's CRRA utility
function
beta : scalar(float)
The household's discount factor
alpha : scalar(float)
The correlation coefficient in the shock process
sigma : scalar(float)
The volatility of the shock process
grid : array_like(float), optional(default=None)
The grid points on which to evaluate the asset prices. Grid
points should be nonnegative. If None is passed, we will create
a reasonable one for you
Attributes
----------
371
372
for i, y in enumerate(grid):
# == u'(G(y,z)) G(y,z) == #
integrand = lambda z: (y**alpha * z)**(1 - gamma)
h[i] = beta * self.integrate(integrand)
return h
def _new_grid(self):
"""
Construct the default grid for the problem
This is defined to be np.linspace(0, 10, 100) when alpha > 1
and 100 evenly spaced points covering 4 standard deviations
when alpha < 1
"""
grid_size = 100
if abs(self.alpha) >= 1.0:
grid_min, grid_max = 0.0, 10.0
else:
# == Set the grid interval to contain most of the mass of the
# stationary distribution of the consumption endowment == #
ssd = self.sigma / np.sqrt(1 - self.alpha**2)
grid_min, grid_max = np.exp(-4 * ssd), np.exp(4 * ssd)
grid = np.linspace(grid_min, grid_max, grid_size)
return grid, grid_min, grid_max, grid_size
def integrate(self, g, int_min=None, int_max=None):
"""
Integrate the function g(z) * self.phi(z) from int_min to
int_max.
Parameters
---------g : function
The function which to integrate
int_min, int_max : scalar(float), optional
The bounds of integration. If either of these parameters are
`None` (the default), they will be set to 4 standard
deviations above and below the mean.
Returns
------result : scalar(float)
The result of the integration
"""
# == Simplify notation == #
phi = self.phi
if int_min is None:
int_min = self._int_min
373
374
memory
375
376
Exercise 1 Replicate the figure to show how discount rates affect prices
Solutions
Solution notebook
Overview
Next we study a computational problem concerning career and job choices. The model is originally due to Derek Neal [Nea99] and this exposition draws on the presentation in [LS12], section
6.5.
Model features
career and job within career both chosen to maximize expected discounted wage flow
infinite horizon dynamic programming with two states variables
Model
377
E t wt
(3.30)
t =0
(3.31)
e0 G (de0 ) +
0 F (d 0 ) +
V (, e0 ) G (de0 )
e0 G (de0 ) +
Z Z
V ( 0 , e0 ) G (de0 ) F (d 0 )
Evidently I, I I and I I I correspond to stay put, new job and new life, respectively
Parameterization As in [LS12], section 6.5, we will focus on a discrete version of the model,
parameterized as follows:
both and e take values in the set np.linspace(0, B, N) an even grid of N points between 0 and B inclusive
N = 50
B=5
= 0.95
378
The distributions F and G are discrete distributions generating draws from the grid points
np.linspace(0, B, N)
A very useful family of discrete distributions is the Beta-binomial family, with probability mass
function
n B(k + a, n k + b)
,
k = 0, . . . , n
p(k | n, a, b) =
B( a, b)
k
Interpretation:
draw q from a Beta distribution with shape parameters ( a, b)
run n independent binary trials, each with success probability q
p(k | n, a, b) is the probability of k successes in these n trials
Nice properties:
very flexible class of distributions, including uniform, symmetric unimodal, etc.
only three parameters
Heres a figure showing the effect of different shape parameters when n = 50
The QuantEcon package provides some code for solving the DP problem described above
See in particular this file, which is repeated here for convenience
"""
Filename: career.py
Authors: Thomas Sargent, John Stachurski
379
A class to solve the career / job choice model due to Derek Neal.
References
---------http://quant-econ.net/career.html
..
"""
import numpy as np
from quantecon.distributions import BetaBinomial
class CareerWorkerProblem(object):
"""
An instance of the class is an object with data on a particular
problem of this type, including probabilites, discount factor and
sample space for the variables.
Parameters
---------beta : scalar(float), optional(default=5.0)
Discount factor
B : scalar(float), optional(default=0.95)
Upper bound of for both epsilon and theta
N : scalar(int), optional(default=50)
Number of possible realizations for both epsilon and theta
F_a : scalar(int or float), optional(default=1)
Parameter `a` from the career distribution
F_b : scalar(int or float), optional(default=1)
Parameter `b` from the career distribution
G_a : scalar(int or float), optional(default=1)
Parameter `a` from the job distribution
G_b : scalar(int or float), optional(default=1)
Parameter `b` from the job distribution
Attributes
---------beta, B, N : see Parameters
theta : array_like(float, ndim=1)
A grid of values from 0 to B
epsilon : array_like(float, ndim=1)
A grid of values from 0 to B
F_probs : array_like(float, ndim=1)
The probabilities of different values for F
G_probs : array_like(float, ndim=1)
The probabilities of different values for G
F_mean : scalar(float)
The mean of the distribution for F
G_mean : scalar(float)
380
381
382
383
Interpretation:
If both job and career are poor or mediocre, the worker will experiment with new job and
new career
If career is sufficiently good, the worker will hold it and experiment with new jobs until a
sufficiently good one is found
If both job and career are good, the worker will stay put
Notice that the worker will always hold on to a sufficiently good career, but not necessarily hold
on to even the best paying job
The reason is that high lifetime wages require both variables to be large, and the worker cannot
change careers without changing jobs
Sometimes a good job must be sacrificed in order to change to a better career
Exercises
Exercise 1 Using the default parameterization in the class CareerWorkerProblem, generate and
plot typical sample paths for and e when the worker follows the optimal policy
In particular, modulo randomness, reproduce the following figure (where the horizontal axis represents time)
Hint: To generate the draws from the distributions F and G, use the class DiscreteRV
Exercise 2 Lets now consider how long it takes for the worker to settle down to a permanent
job, given a starting point of (, e) = (0, 0)
T HOMAS S ARGENT AND J OHN S TACHURSKI
384
Solution notebook
Overview
385
Model features
job-specific human capital accumulation combined with on-the-job search
infinite horizon dynamic programming with one state variable and two controls
Model
Let
xt denote the time-t job-specific human capital of a worker employed at a given firm
wt denote current wages
Let wt = xt (1 st t ), where
t is investment in job-specific human capital for the current role
st is search effort, devoted to obtaining new offers from other firms.
For as long as the worker remains in the current job, evolution of { xt } is given by xt+1 = G ( xt , t )
When search effort at t is st , the worker receives a new job offer with probability (st ) [0, 1]
Value of offer is Ut+1 , where {Ut } is iid with common distribution F
Worker has the right to reject the current offer and continue with existing job.
In particular, xt+1 = Ut+1 if accepts and xt+1 = G ( xt , t ) if rejects
Letting bt+1 {0, 1} be binary with bt+1 = 1 indicating an offer, we can write
xt+1 = (1 bt+1 ) G ( xt , t ) + bt+1 max{ G ( xt , t ), Ut+1 }
(3.32)
Agents objective: maximize expected discounted sum of wages via controls {st } and {t }
Taking the expectation of V ( xt+1 ) and using (3.32), the Bellman equation for this problem can be
written as
Z
V ( x ) = max x (1 s ) + (1 (s))V [ G ( x, )] + (s) V [ G ( x, ) u] F (du) . (3.33)
s + 1
386
Back-of-the-Envelope Calculations Before we solve the model, lets make some quick calculations that provide intuition on what the solution should look like.
To begin, observe that the worker has two instruments to build capital and hence wages:
1. invest in capital specific to the current job via
2. search for a new job with better job-specific capital match via s
Since wages are x (1 s ), marginal cost of investment via either or s is identical
Our risk neutral worker should focus on whatever instrument has the highest expected return
The relative expected return will depend on x
For example, suppose first that x = 0.05
If s = 1 and = 0, then since G ( x, ) = 0, taking expectations of (3.32) gives expected next
period capital equal to (s)EU = EU = 0.5
If s = 0 and = 1, then next period capital is G ( x, ) = G (0.05, 1) 0.23
Both rates of return are good, but the return from search is better
Next suppose that x = 0.4
If s = 1 and = 0, then expected next period capital is again 0.5
If s = 0 and = 1, then G ( x, ) = G (0.4, 1) 0.8
Return from investment via dominates expected return from search
Combining these observations gives us two informal predictions:
1. At any given state x, the two controls and s will function primarily as substitutes worker
will focus on whichever instrument has the higher expected return
2. For sufficiently small x, search will be preferable to investment in job-specific human capital.
For larger x, the reverse will be true
Now lets turn to implementation, and see if we can match our predictions.
Implementation
The QuantEcon package provides some code for solving the DP problem described above
See in particular jv.py, which is repeated here for convenience
"""
Filename: jv.py
Authors: Thomas Sargent, John Stachurski
References
----------http://quant-econ.net/jv.html
387
class JvWorker(object):
r"""
A Jovanovic-type model of employment with on-the-job search. The
value function is given by
.. math::
V(x) = \max_{\phi, s} w(x, \phi, s)
for
.. math::
w(x, \phi, s) := x(1 - \phi - s)
+ \beta (1 - \pi(s)) V(G(x, \phi))
+ \beta \pi(s) E V[ \max(G(x, \phi), U)]
Here
*
*
*
*
*
*
*
x = human capital
s = search effort
:math:`\phi` = investment in human capital
:math:`\pi(s)` = probability of new offer given search level s
:math:`x(1 - \phi - s)` = wage
:math:`G(x, \phi)` = new human capital when current job retained
U = RV with distribution F -- new draw of human capital
Parameters
---------A : scalar(float), optional(default=1.4)
Parameter in human capital transition function
alpha : scalar(float), optional(default=0.6)
Parameter in human capital transition function
388
389
390
where
w(s, ) := x (1 s ) + (1 (s))V [ G ( x, )] + (s)
V [ G ( x, ) u] F (du)
(3.34)
Here we are minimizing instead of maximizing to fit with SciPys optimization routines
When we represent V, it will be with a NumPy array V giving values on grid x_grid
But to evaluate the right-hand side of (3.34), we need a function, so we replace the arrays V and
x_grid with a function Vf that gives linear iterpolation of V on x_grid
Hence in the preliminaries of bellman_operator
from the array V we define a linear interpolation Vf of its values
c1 is used to implement the constraint s + 1
c2 is used to implement s e, a numerically stable
alternative to the true constraint s 0
c3 does the same for
Inside the for loop, for each x in the grid over the state space, we set up the function w(z) =
w(s, ) defined in (3.34).
The function is minimized over all feasible (s, ) pairs, either by
a relatively sophisticated solver from SciPy called fmin_slsqp, or
brute force search over a grid
The former is much faster, but convergence to the global optimum is not guaranteed. Grid search
is a simple way to check results
T HOMAS S ARGENT AND J OHN S TACHURSKI
391
Lets plot the optimal policies and see what they look like
The code is in a file examples/jv_test.py from the main repository and looks as follows
from quantecon import compute_fixed_point
from quantecon.models import JvWorker
# === solve for optimal policy === #
wp = JvWorker(grid_size=25)
v_init = wp.x_grid * 0.5
V = compute_fixed_point(wp.bellman_operator, v_init, max_iter=40)
s_policy, phi_policy = wp.bellman_operator(V, return_policies=True)
# === plot policies === #
fig, ax = plt.subplots()
ax.set_xlim(0, max(wp.x_grid))
ax.set_ylim(-0.1, 1.1)
ax.plot(wp.x_grid, phi_policy, 'b-', label='phi')
ax.plot(wp.x_grid, s_policy, 'g-', label='s')
ax.set_xlabel("x")
ax.legend()
plt.show()
392
Once x is larger, worker does better by investing in human capital specific to the current
position
Exercises
Exercise 1 Lets look at the dynamics for the state process { xt } associated with these policies.
The dynamics are given by (3.32) when t and st are chosen according to the optimal policies, and
P { bt + 1 = 1 } = ( s t ) .
Since the dynamics are random, analysis is a bit subtle
One way to do it is to plot, for each x in a relatively fine grid called plot_grid, a large number K
of realizations of xt+1 given xt = x. Plot this with one dot for each realization, in the form of a 45
degree diagram. Set:
K = 50
plot_grid_max, plot_grid_size = 1.2, 100
plot_grid = np.linspace(0, plot_grid_max, plot_grid_size)
fig, ax = plt.subplots()
ax.set_xlim(0, plot_grid_max)
ax.set_ylim(0, plot_grid_max)
By examining the plot, argue that under the optimal policies, the state xt will converge to a constant value x close to unity
Argue that at the steady state, st 0 and t 0.6.
Exercise 2 In the preceding exercise we found that st converges to zero and t converges to about
0.6
Since these results were calculated at a value of close to one, lets compare them to the best
choice for an infinitely patient worker.
Intuitively, an infinitely patient worker would like to maximize steady state wages, which are a
function of steady state capital.
You can take it as givenits certainly truethat the infinitely patient worker does not search in
the long run (i.e., st = 0 for large t)
Thus, given , steady state capital is the positive fixed point x () of the map x 7 G ( x, ).
Steady state wages can be written as w () = x ()(1 )
Graph w () with respect to , and examine the best choice of
Can you give a rough interpretation for the value that you see?
Solutions
Solution notebook
393
Overview
In this lecture we consider an extension of the job search model developed by John J. McCall
[McC70]
In the McCall model, an unemployed worker decides when to accept a permanent position at a
specified wage, given
his or her discount rate
the level of unemployment compensation
the distribution from which wage offers are drawn
In the version considered below, the wage distribution is unknown and must be learned
Based on the presentation in [LS12], section 6.6
Model features
Infinite horizon dynamic programming with two states and one binary control
Bayesian updating to learn the unknown distribution
Model
Lets first recall the basic McCall model [McC70] and then add the variation we want to consider
The Basic McCall Model Consider an unemployed worker who is presented in each period with
a permanent job offer at wage wt
At time t, our worker has two choices
1. Accept the offer and work permanently at constant wage wt
2. Reject the offer, receive unemployment compensation c, and reconsider next period
394
The wage sequence {wt } is iid and generated from known density h
t
The worker aims to maximize the expected discounted sum of earnings E
t =0 y t
Trade-off:
Waiting too long for a good offer is costly, since the future is discounted
Accepting too early is costly, since better offers will arrive with probability one
Let V (w) denote the maximal expected discounted sum of earnings that can be obtained by an
unemployed worker who starts with wage offer w in hand
The function V satisfies the recursion
V (w) = max
w
, c+
1
V (w )h(w )dw
(3.35)
where the two terms on the r.h.s. are the respective payoffs from accepting and rejecting the
current offer w
The optimal policy is a map from states into actions, and hence a binary function of w
Not surprisingly, it turns out to have the form 1{w w }, where
w is a constant depending on ( , h, c) called the reservation wage
1{w w } is an indicator function returning 1 if w w and 0 otherwise
1 indicates accept and 0 indicates reject
For further details see [LS12], section 6.3
Offer Distribution Unknown Now lets extend the model by considering the variation presented in [LS12], section 6.6
The model is as above, apart from the fact that
the density h is unknown
the worker learns about h by starting with a prior and updating based on wage offers that
he/she observes
The worker knows there are two possible distributions F and G with densities f and g
At the start of time, nature selects h to be either f or g the wage distribution from which the
entire sequence {wt } will be drawn
This choice is not observed by the worker, who puts prior probability 0 on f being chosen
Update rule: workers time t estimate of the distribution is t f + (1 t ) g, where t updates via
t +1 =
t f ( w t +1 )
t f ( w t +1 ) + (1 t ) g ( w t +1 )
(3.36)
This last expression follows from Bayes rule, which tells us that
P{ h = f | W = w } =
P {W = w | h = f } P { h = f }
P {W = w }
and P{W = w} =
P {W = w | h = } P { h = }
{ f ,g}
395
The fact that (3.36) is recursive allows us to progress to a recursive solution method
Letting
h ( w ) : = f ( w ) + (1 ) g ( w )
and
q(w, ) :=
f (w)
f ( w ) + (1 ) g ( w )
we can express the value function for the unemployed worker recursively as follows
Z
w
0
0
0
0
V (w, ) = max
, c + V (w , ) h (w ) dw
where 0 = q(w0 , )
1
(3.37)
Notice that the current guess is a state variable, since it affects the workers perception of probabilities for future rewards
Parameterization Following section 6.6 of [LS12], our baseline parameterization will be
f = Beta(1, 1) and g = Beta(3, 1.2)
= 0.95 and c = 0.6
The densities f and g have the following shape
Looking Forward What kind of optimal policy might result from (3.37) and the parameterization
specified above?
Intuitively, if we accept at wa and wa wb , then all other things being given we should also
accept at wb
This suggests a policy of accepting whenever w exceeds some threshold value w
But w should depend on in fact it should be decreasing in because
T HOMAS S ARGENT AND J OHN S TACHURSKI
396
Lets set about solving the model and see how our results match with our intuition
We begin by solving via value function iteration (VFI), which is natural but ultimately turns out
to be second best
VFI is implemented in the file odu.py contained in the QuantEcon package
The code is as follows
"""
Filename: odu.py
Authors: Thomas Sargent, John Stachurski
Solves the "Offer Distribution Unknown" Model by value function
iteration and a second faster method discussed in the corresponding
quantecon lecture.
"""
from scipy.interpolate import LinearNDInterpolator
from scipy.integrate import fixed_quad
from scipy.stats import beta as beta_distribution
from scipy import interp
from numpy import maximum as npmax
import numpy as np
class SearchProblem:
"""
A class to store a given parameterization of the "offer distribution
unknown" model.
Parameters
---------beta : scalar(float), optional(default=0.95)
The discount parameter
c : scalar(float), optional(default=0.6)
The unemployment compensation
F_a : scalar(float), optional(default=1)
First parameter of beta distribution on F
F_b : scalar(float), optional(default=1)
Second parameter of beta distribution on F
397
398
new_pi : scalar(float)
The updated probability
"""
new_pi = 1.0 / (1 + ((1 - pi) * self.g(w)) / (pi * self.f(w)))
# Return new_pi when in [pi_min, pi_max] and else end points
new_pi = np.maximum(np.minimum(new_pi, self.pi_max), self.pi_min)
return new_pi
def bellman_operator(self, v):
"""
The Bellman operator. Including for comparison. Value function
iteration is not recommended for this problem. See the
reservation wage operator below.
Parameters
---------v : array_like(float, ndim=1, length=len(pi_grid))
An approximate value function represented as a
one-dimensional array.
Returns
------new_v : array_like(float, ndim=1, length=len(pi_grid))
The updated value function
"""
# == Simplify names == #
f, g, beta, c, q = self.f, self.g, self.beta, self.c, self.q
vf = LinearNDInterpolator(self.grid_points, v)
N = len(v)
new_v = np.empty(N)
for i in range(N):
w, pi = self.grid_points[i, :]
v1 = w / (1 - beta)
integrand = lambda m: vf(m, q(m, pi)) * (pi * f(m)
+ (1 - pi) * g(m))
integral, error = fixed_quad(integrand, 0, self.w_max)
v2 = c + beta * integral
new_v[i] = max(v1, v2)
return new_v
def get_greedy(self, v):
"""
Compute optimal actions taking v as the value function.
399
Parameters
---------v : array_like(float, ndim=1, length=len(pi_grid))
An approximate value function represented as a
one-dimensional array.
Returns
------policy : array_like(float, ndim=1, length=len(pi_grid))
The decision to accept or reject an offer where 1 indicates
accept and 0 indicates reject
"""
# == Simplify names == #
f, g, beta, c, q = self.f, self.g, self.beta, self.c, self.q
vf = LinearNDInterpolator(self.grid_points, v)
N = len(v)
policy = np.zeros(N, dtype=int)
for i in range(N):
w, pi = self.grid_points[i, :]
v1 = w / (1 - beta)
integrand = lambda m: vf(m, q(m, pi)) * (pi * f(m) +
(1 - pi) * g(m))
integral, error = fixed_quad(integrand, 0, self.w_max)
v2 = c + beta * integral
policy[i] = v1 > v2 # Evaluates to 1 or 0
return policy
def res_wage_operator(self, phi):
"""
Updates the reservation wage function guess phi via the operator
Q.
Parameters
---------phi : array_like(float, ndim=1, length=len(pi_grid))
This is reservation wage guess
Returns
------new_phi : array_like(float, ndim=1, length=len(pi_grid))
The updated reservation wage guess.
"""
# == Simplify names == #
beta, c, f, g, q = self.beta, self.c, self.f, self.g, self.q
# == Turn phi into a function == #
phi_f = lambda p: interp(p, self.pi_grid, phi)
400
The class SearchProblem is used to store parameters and methods needed to compute optimal
actions
The Bellman operator is implemented as the method bellman_operator(), while get_greedy()
computes an approximate optimal policy from a guess v of the value function
We will omit a detailed discussion of the code because there is a more efficient solution method
These ideas are implemented in the res_wage_operator method
Before explaining it lets look quickly at solutions computed from value function iteration
Heres the value function:
401
The black line in the figure above corresponds to the function w ( ) introduced there
decreasing as expected
Take 2: A More Efficient Method
(3.38)
w
w ( )
,
1 1
(3.39)
January 28, 2015
402
max
w0
w ( 0 )
,
1 1
h (w0 ) dw0
max w0 , w q(w0 , ) h (w0 ) dw0
(3.40)
Equation (3.40) can be understood as a functional equation, where w is the unknown function
Lets call it the reservation wage functional equation (RWFE)
The solution w to the RWFE is the object that we wish to compute
Solving the RWFE To solve the RWFE, we will first show that its solution is the fixed point of a
contraction mapping
To this end, let
b[0, 1] be the bounded real-valued functions on [0, 1]
kk := supx[0,1] |( x )|
Consider the operator Q mapping b[0, 1] into Q b[0, 1] via
( Q)( ) = (1 )c +
max w0 , q(w0 , ) h (w0 ) dw0
(3.41)
Comparing (3.40) and (3.41), we see that the set of fixed points of Q exactly coincides with the set
of solutions to the RWFE
If Qw = w then w solves (3.40) and vice versa
Moreover, for any , b[0, 1], basic algebra and the triangle inequality for integrals tells us that
|( Q)( ) ( Q)( )|
max w0 , q(w0 , ) max w0 , q(w0 , ) h (w0 ) dw0 (3.42)
Working case by case, it is easy to check that for real numbers a, b, c we always have
(3.43)
|( Q)( ) ( Q)( )|
q(w0 , ) q(w0 , ) h (w0 ) dw0 k k
(3.44)
k Q Qk k k
(3.45)
In other words, Q is a contraction of modulus on the complete metric space (b[0, 1], k k)
Hence
A unique solution w to the RWFE exists in b[0, 1]
Qk w uniformly as k , for any b[0, 1]
T HOMAS S ARGENT AND J OHN S TACHURSKI
403
Implementation These ideas are implemented in the res_wage_operator method from odu.py
as shown above
The method corresponds to action of the operator Q
The following exercise asks you to exploit these facts to compute an approximation to w
Exercises
Exercise 1 Use the default parameters and the res_wage_operator method to compute an optimal policy
Your result should coincide closely with the figure for the optimal policy shown above
Try experimenting with different parameters, and confirm that the change in the optimal policy
coincides with your intuition
Solutions
Solution notebook
Overview
Next we study the standard optimal savings problem for an infinitely lived consumerthe common ancestor described in [LS12], section 1.3
Also known as the income fluctuation problem
An important sub-problem for many representative macroeconomic models
[Aiy94]
[Hug93]
etc.
Useful references include [Dea91], [DH10], [Kuh13], [Rab02], [Rei09] and [SE77]
T HOMAS S ARGENT AND J OHN S TACHURSKI
404
Consider a household that chooses a state-contingent consumption plan {ct }t0 to maximize
E
t u(ct )
t =0
subject to
ct + at+1 Rat + zt ,
ct 0,
at b
t = 0, 1, . . .
(3.46)
Here
(0, 1) is the discount factor
at is asset holdings at time t, with ad-hoc borrowing constraint at b
ct is consumption
zt is non-capital income (wages, unemployment compensation, etc.)
R := 1 + r, where r > 0 is the interest rate on savings
Assumptions
1. {zt } is a finite Markov process with Markov matrix taking values in Z
2. | Z | < and Z (0, )
3. r > 0 and R < 1
4. u is smooth, strictly increasing and strictly concave with limc0 u0 (c)
limc u0 (c) = 0
and
405
The asset space is [b, ) and the state is the pair ( a, z) S := [b, ) Z
A feasible consumption path from ( a, z) S is a consumption sequence {ct } such that {ct } and its
induced asset path { at } satisfy
1. ( a0 , z0 ) = ( a, z)
2. the feasibility constraints in (3.46), and
3. measurability of ct w.r.t. the filtration generated by {z1 , . . . , zt }
The meaning of the third point is just that consumption at time t can only be a function of outcomes
that have already been observed
The value function V : S R is defined by
(
V ( a, z) := sup E
t u(ct )
)
(3.47)
t =0
where the supremum is over all feasible consumption paths from ( a, z).
An optimal consumption path from ( a, z) is a feasible consumption path from ( a, z) that attains the
supremum in (3.47)
Given our assumptions, it is known that
1. For each ( a, z) S, a unique optimal consumption path from ( a, z) exists
2. This path is the unique feasible path from ( a, z) satisfying the Euler equality
u0 (ct ) = max R Et [u0 (ct+1 )] , u0 ( Rat + zt + b)
(3.48)
(3.49)
Moreover, there exists an optimal consumption function c : S [0, ) such that the path from ( a, z)
generated by
( a0 , z0 ) = ( a, z),
ct = c ( at , zt )
and
at+1 = Rat + zt ct
satisfies both (3.48) and (3.49), and hence is the unique optimal path from ( a, z)
In summary, to solve the optimization problem, we need to compute c
Computation
406
(3.50)
(3.51)
where
J ( a, z) := {t R : min Z t Ra + z + b}
(3.52)
(c, d C )
sS
(3.53)
We have to be careful with VFI (i.e., iterating with T) in this setting because u is not assumed to be
bounded
T HOMAS S ARGENT AND J OHN S TACHURSKI
407
In fact typically unbounded both above and below e.g. u(c) = log c
In which case, the standard DP theory does not apply
T n v is not guaranteed to converge to the value function for arbitrary continous bounded v
Nonetheless, we can always try the strategy iterate and hope
In this case we can check the outcome by comparing with PFI
The latter is known to converge, as described above
Implementation The code in ifp.py from QuantEcon provides implementations of both VFI and
PFI
The code is repeated here and a description and clarifications are given below
"""
Filename: ifp.py
Authors: Thomas Sargent, John Stachurski
Tools for solving the standard optimal savings / income fluctuation
problem for an infinitely lived consumer facing an exogenous income
process that evolves according to a Markov chain.
References
---------http://quant-econ.net/ifp.html
"""
import numpy as np
from scipy.optimize import fminbound, brentq
from scipy import interp
class ConsumerProblem:
"""
A class for solving the income fluctuation problem. Iteration with
either the Coleman or Bellman operators from appropriate initial
conditions leads to convergence to the optimal consumption policy.
The income process is a finite state Markov chain. Note that the
Coleman operator is the preferred method, as it is almost always
faster and more accurate. The Bellman operator is only provided for
comparison.
Parameters
---------r : scalar(float), optional(default=0.01)
A strictly positive scalar giving the interest rate
beta : scalar(float), optional(default=0.96)
The discount factor, must satisfy (1 + r) * beta < 1
Pi : array_like(float), optional(default=((0.60, 0.40),(0.05, 0.95))
408
409
410
411
defines methods
bellman_operator, which implements the Bellman operator T specified above
coleman_operator, which implements the Coleman operator K specified above
initialize, which generates suitable initial conditions for iteration
The methods bellman_operator and coleman_operator both use linear interpolation along the
asset grid to approximate the value and consumption functions
The following exercises walk you through several applications where policy functions are computed
In exercise 1 you will see that while VFI and PFI produce similar results, the latter is much faster
Because we are exploiting analytically derived first order conditions
Another benefit of working in policy function space rather than value function space is that value
functions typically have more curvature
Makes them harder to approximate numerically
Exercises
Exercise 1 The first exercise is to replicate the following figure, which compares PFI and VFI as
solution methods
412
Consumption is shown as a function of assets with income z held fixed at its smallest value
The following details are needed to replicate the figure
The parameters are the default parameters in the definition of consumerProblem
The initial conditions are the default ones from initialize()
Both operators are iterated 80 times
When you run your code you will observe that iteration with K is faster than iteration with T
In the IPython shell, a comparison of the operators can be made as follows
In [1]: import quantecon as qe
In [2]: cp = qe.ConsumerProblem()
In [3]: v, c = cp.initialize()
In [4]: timeit cp.bellman_operator(v)
10 loops, best of 3: 142 ms per loop
In [5]: timeit cp.coleman_operator(c)
10 loops, best of 3: 24.9 ms per loop
413
414
We can see from the figure that the dynamics will be stable assets do not diverge
In fact there is a unique stationary distribution of assets that we can calculate by simulation
Can be proved via theorem 2 of [HP92]
Represents the long run dispersion of assets across households when households have idiosyncratic shocks
Ergodicity is valid here, so stationary probabilities can be calculated by averaging over a single
long time series
Hence to approximate the stationary distribution we can simulate a long time series for
assets and histogram, as in the following figure
415
3.7. ROBUSTNESS
For a given parameterization of the model, the mean of the stationary distribution can be interpreted as aggregate capital in an economy with a unit mass of ex-ante identical households facing
idiosyncratic shocks
Lets look at how this measure of aggregate capital varies with the interest rate and borrowing
constraint
The next figure plots aggregate capital against the interest rate for b in (1, 3)
Solution notebook
3.7 Robustness
3.7. ROBUSTNESS
416
Contents
Robustness
Overview
The Model
Constructing More Robust Policies
Robustness as Outcome of a Two-Person Zero-Sum Game
The Stochastic Case
Implementation
Application
Appendix
Overview
This lecture modifies a Bellman equation to express a decision makers doubts about transition
dynamics
His specification doubts make the decision maker want a robust decision rule
Robust means insensitive to misspecification of transition dynamics
The decision maker has a single approximating model
He calls it approximating to acknowledge that he doesnt completely trust it
He fears that outcomes will actually be determined by another model that he cannot describe
explicitly
All that he knows is that the actual data-generating model is in some (uncountable) set of models
that surrounds his approximating model
He quantifies the discrepancy between his approximating model and the genuine data-generating
model by using a quantity called entropy
(Well explain what entropy means below)
He wants a decision rule that will work well enough no matter which of those other models actually governs outcomes
This is what it means for his decision rule to be robust to misspecification of an approximating
model
This may sound like too much to ask for, but . . .
. . . a secret weapon is available to design robust decision rules
The secret weapon is max-min control theory
A value-maximizing decision maker enlists the aid of an (imaginary) value-minimizing model
chooser to construct bounds on the value attained by a given decision rule under different models
of the transition dynamics
The original decision maker uses those bounds to construct a decision rule with an assured performance level, no matter which model actually governs outcomes
T HOMAS S ARGENT AND J OHN S TACHURSKI
3.7. ROBUSTNESS
417
Note: In reading this lecture, please dont think that our decision maker is paranoid when he
conducts a worst-case analysis. By designing a rule that works well against a worst-case, his
intention is to construct a rule that will work well across a set of models.
Sets of Models Imply Sets Of Values Our robust decision maker wants to know how well a
given rule will work when he does not know a single transition law . . .
. . . he wants to know sets of values that will be attained by a given decision rule F under a set of
transition laws
Ultimately, he wants to design a decision rule F that shapes these sets of values in ways that he
prefers
With this in mind, consider the following graph, which relates to a particular decision problem to
be explained below
418
3.7. ROBUSTNESS
Entropy is zero when the set includes only the approximating model, indicating that
the decision maker completely trusts the approximating model
Entropy is bigger, and the set of surrounding models is bigger, the less the decision
maker trusts the approximating model
The shaded region indicates that for all models having entropy less than or equal to the number
on the horizontal axis, the value obtained will be somewhere within the indicated set of values
Now lets compare sets of values associated with two different decision rules, Fr and Fb
In the next figure,
The red set shows the value-entropy correspondence for decision rule Fr
The blue set shows the value-entropy correspondence for decision rule Fb
419
3.7. ROBUSTNESS
For simplicity, we present ideas in the context of a class of problems with linear transition laws
and quadratic objective functions
To fit in with our earlier lecture on LQ control, we will treat loss minimization rather than value
maximization
To begin, recall the infinite horizon LQ problem, where an agent chooses a sequence of controls {ut }
to minimize
(3.54)
t xt0 Rxt + u0t Qut
t =0
t = 0, 1, 2, . . .
(3.55)
As before,
xt is n 1, A is n n
ut is k 1, B is n k
wt is j 1, C is n j
R is n n and Q is k k
Here xt is the state, ut is the control, and wt is a shock vector.
For now we take {wt } := {wt }
t=1 to be deterministic a single fixed sequence
We also allow for model uncertainty on the part of the agent solving this optimization problem
In particular, the agent takes wt = 0 for all t 0 as a benchmark model, but admits the possibility
that this model might be wrong
As a consequence, she also considers a set of alternative models expressed in terms of sequences
{wt } that are close to the zero sequence
420
3.7. ROBUSTNESS
She seeks a policy that will do well enough for a set of alternative models whose members are
pinned down by sequences {wt }
Soon well quantify the quality of a model specification in terms of the maximal size of the exprest +1 w 0 w
sion
t =0
t +1 t +1
Constructing More Robust Policies
If our agent takes {wt } as a given deterministic sequence, then, drawing on intuition from earlier
lectures on dynamic programming, we can anticipate Bellman equations such as
Jt1 ( x ) = min{ x 0 Rx + u0 Qu + Jt ( Ax + Bu + Cwt )}
u
(3.56)
D( P) := P + PC ( I C 0 PC )1 C 0 P
(3.58)
January 28, 2015
421
3.7. ROBUSTNESS
and I is a j j identity matrix. Substituting this expression for the maximum into (3.56) yields
x 0 Px = min{ x 0 Rx + u0 Qu + ( Ax + Bu)0 D( P)( Ax + Bu)}
(3.59)
B( P) := R 2 A0 PB( Q + B0 PB)1 B0 PA + A0 PA
The operator B is the standard (i.e., non-robust) LQ Bellman operator, and P = B( P) is the standard matrix Riccati equation coming from the Bellman equation see this discussion
Under some regularity conditions (see [HS08]), the operator B D has a unique positive definite
fixed point, which we denote below by P
where
A robust policy, indexed by , is u = Fx
F := ( Q + B0 D( P ) B)1 B0 D( P ) A
(3.60)
We also define
)1 C 0 P ( A B F )
K := ( I C 0 PC
(3.61)
t on the worst-case path of { xt }, in the sense that this
The interpretation of K is that wt+1 = Kx
What we have done above can be interpreted in terms of a two-person zero-sum game in which
K are Nash equilibrium objects
F,
Agent 1 is our original agent, who seeks to minimize loss in the LQ program while admitting the
possibility of misspecification
Agent 2 is an imaginary malevolent player
Agent 2s malevolence helps the original agent to compute bounds on his value function across a
set of models
We begin with agent 2s problem
T HOMAS S ARGENT AND J OHN S TACHURSKI
422
3.7. ROBUSTNESS
t wt0 wt
(3.62)
t =1
Now let F be a fixed policy, and let JF ( x0 , w) be the present-value cost of that policy given sequence
w := {wt } and initial condition x0 Rn
Substituting Fxt for ut in (3.54), this value can be written as
JF ( x0 , w ) : =
t xt0 ( R + F0 QF)xt
(3.63)
t =0
where
xt+1 = ( A BF ) xt + Cwt+1
(3.64)
max t xt0 ( R + F 0 QF ) xt (wt0 +1 wt+1 )
w
t =0
max t xt0 ( R + F 0 QF ) xt wt0 +1 wt+1
w
or, equivalently,
t =0
min t xt0 ( R + F 0 QF ) xt + wt0 +1 wt+1
w
(3.65)
t =0
subject to (3.64)
Whats striking about this optimization problem is that it is once again an LQ discounted dynamic
programming problem, with w = {wt } as the sequence of controls
The expression for the optimal policy can be found by applying the usual LQ formula (see here)
We denote it by K ( F, ), with the interpretation wt+1 = K ( F, ) xt
T HOMAS S ARGENT AND J OHN S TACHURSKI
423
3.7. ROBUSTNESS
The remaining step for agent 2s problem is to set to enforce the constraint (3.62), which can be
done by choosing = such that
t xt0 K ( F, )0 K ( F, ) xt =
(3.66)
t =0
R ( x0 , F )
0
t
0
x
(
R
+
F
QF
)
x
+
t wt0 +1 wt+1 ,
t
t =0
t =0
R ( x0 , F ) ent
xt0 ( R + F 0 QF ) xt
(3.67)
t =0
where
ent = t xt0 K ( F, )0 K ( F, ) xt
(3.68)
t =0
To construct the lower bound on the set of values associated with all perturbations w satisfying the
entropy constraint (3.62) at a given entropy level, we proceed as follows:
For a given , solve the minimization problem (3.65)
Compute the minimizer R ( x0 , F ) and the associated entropy using (3.68)
Compute the lower bound on the value function R ( x0 , F ) ent and plot it against ent
Repeat the preceding three steps for a range of values of to trace out the lower bound
Note: This procedure sweeps out a set of separating hyperplanes indexed by different values for
the Lagrange multiplier
424
3.7. ROBUSTNESS
The Upper Bound To construct an upper bound we use a very similar procedure
We simply replace the minimization problem (3.65) with the maximization problem
t0 +1 wt+1
V ( x0 , F ) = max t xt0 ( R + F 0 QF ) xt w
w
(3.69)
t =0
V ( x0 , F )
0
t
0
x
(
R
+
F
QF
)
x
t wt0 +1 wt+1
t
t =0
t =0
xt0 ( R + F 0 QF ) xt
(3.70)
t =0
where
ent = t xt0 K ( F, )0 K ( F, ) xt
(3.71)
t =0
To construct the upper bound on the set of values associated all perturbations w with a given entropy we proceed much as we did for the lower bound
solve the maximization problem (3.69)
For a given ,
Compute the maximizer V ( x0 , F ) and the associated entropy using (3.71)
Compute the upper bound on the value function V ( x0 , F ) + ent and plot it against ent
Repeat the preceding three steps for a range of values of to trace out the upper bound
Reshaping the set of values Now in the interest of reshaping these sets of values by choosing F,
we turn to agent 1s problem
Agent 1s Problem Now we turn to agent 1, who solves
min t xt0 Rxt + u0t Qut wt0 +1 wt+1
{ u t } t =0
(3.72)
425
3.7. ROBUSTNESS
(3.73)
t =0
subject to
xt+1 = ( A + CK ) xt + But
(3.74)
Once again, the expression for the optimal policy can be found here we denote it by F
Nash Equilibrium Clearly the F we have obtained depends on K, which, in agent 2s problem,
depended on an initial policy F
Holding all other parameters fixed, we can represent this relationship as a mapping , where
F = (K ( F, ))
The map F 7 (K ( F, )) corresponds to a situation in which
1. agent 1 uses an arbitrary initial policy F
2. agent 2 best responds to agent 1 by choosing K ( F, )
3. agent 1 best responds to agent 2 by choosing F = (K ( F, ))
As you may have already guessed, the robust policy F defined in (3.60) is a fixed point of the
mapping
In particular, for any given ,
) = K,
where K is as given in (3.61)
1. K ( F,
2. (K ) = F
A sketch of the proof is given in the appendix
The Stochastic Case
Now we turn to the stochastic case, where the sequence {wt } is treated as an iid sequence of
random vectors
In this setting, we suppose that our agent is uncertain about the conditional probability distribution
of wt+1
The agent takes the standard normal distribution N (0, I ) as the baseline conditional distribution,
while admitting the possibility that other nearby distributions prevail
These alternative conditional distributions of wt+1 might depend nonlinearly on the history xs , s
t
To implement this idea, we need a notion of what it means for one distribution to be near another
one
Here we adopt a very useful measure of closeness for distributions known as the relative entropy,
or Kullback-Leibler divergence
T HOMAS S ARGENT AND J OHN S TACHURSKI
426
3.7. ROBUSTNESS
(3.75)
Here P represents the set of all densities on Rn and is the benchmark distribution N (0, I )
The distribution is chosen as the least desirable conditional distribution in terms of next period
outcomes, while taking into account the penalty term DKL (, )
This penalty term plays a role analogous to the one played by the deterministic penalty w0 w in
(3.56), since it discourages large deviations from the benchmark
Solving the Model The maximization problem in (3.75) appears highly nontrivial after all,
we are maximizing over an infinite dimensional space consisting of the entire set of densities
However, it turns out that the solution is tractable, and in fact also falls within the class of normal
distributions
First, we note that J has the form J ( x ) = x 0 Px + d for some positive definite matrix P and constant
real number d
Moreover, it turns out that if ( I 1 C 0 PC )1 is nonsingular, then
max
P
Z
where
(, P) := ln[det( I 1 C 0 PC )1 ]
(3.77)
Substituting the expression for the maximum into Bellman equation (3.75) and using J ( x ) =
x 0 Px + d gives
x 0 Px + d = min x 0 Rx + u0 Qu + ( Ax + Bu)0 D( P)( Ax + Bu) + [d + (, P)]
(3.78)
u
Since constant terms do not affect minimizers, the solution is the same as (3.59), leading to
x 0 Px + d = x 0 B(D( P)) x + [d + (, P)]
To solve this Bellman equation, we take P to be the positive definite fixed point of B D
427
3.7. ROBUSTNESS
(, P)
1
(3.79)
The robust policy in this stochastic case is the minimizer in (3.78), which is once again u = Fx
for F given by (3.60)
Substituting the robust policy into (3.77) we obtain the worst case shock distribution:
t , ( I 1 C 0 PC
) 1 )
wt+1 N (Kx
where K is given by (3.61)
Note that the mean of the worst-case shock distribution is equal to the same worst-case wt+1 as in
the earlier deterministic setting
Computing Other Quantities Before turning to implementation, we briefly outline how to compute several other quantities of interest
Worst-Case Value of a Policy One thing we will be interested in doing is holding a policy fixed
and computing the discounted loss associated with that policy
So let F be a given policy and let JF ( x ) be the associated loss, which, by analogy with (3.75),
satisfies
Z
0
0
JF ( x ) = max x ( R + F QF ) x +
JF (( A BF ) x + Cw) (dw) DKL (, )
P
Writing JF ( x ) = x 0 PF x + d F and applying the same argument used to derive (3.76) we get
x 0 PF x + d F = x 0 ( R + F 0 QF ) x + x 0 ( A BF )0 D( PF )( A BF ) x + d F + (, PF )
To solve this we take PF to be the fixed point
PF = R + F 0 QF + ( A BF )0 D( PF )( A BF )
and
d F :=
(, PF ) =
ln[det( I 1 C 0 PF C )1 ]
1
1
(3.80)
If you skip ahead to the appendix, you will be able to verify that PF is the solution to the Bellman
equation in agent 2s problem discussed above we use this in our computations
Implementation
The QuantEcon package provides a class called RBLQ for implementation of robust LQ optimal
control
Heres the relevant code, from file robustlq.py
428
3.7. ROBUSTNESS
"""
Filename: robustlq.py
Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski
Solves robust LQ control problems.
"""
from __future__ import division # Remove for Python 3.sx
import numpy as np
from .lqcontrol import LQ
from .quadsums import var_quadratic_sum
from numpy import dot, log, sqrt, identity, hstack, vstack, trace
from scipy.linalg import solve, inv, det
from .matrix_eqn import solve_discrete_lyapunov
class RBLQ:
r"""
Provides methods for analysing infinite horizon robust LQ control
problems of the form
.. math::
min_{u_t}
subject to
.. math::
x_{t+1} = A x_t + B u_t + C w_{t+1}
and with model misspecification parameter theta.
Parameters
---------Q : array_like(float, ndim=2)
The cost(payoff) matrix for the controls. See above for more.
Q should be k x k and symmetric and positive definite
R : array_like(float, ndim=2)
The cost(payoff) matrix for the state. See above for more. R
should be n x n and symmetric and non-negative definite
A : array_like(float, ndim=2)
The matrix that corresponds with the state in the state space
system. A should be n x n
B : array_like(float, ndim=2)
The matrix that corresponds with the control in the state space
system. B should be n x k
C : array_like(float, ndim=2)
The matrix that corresponds with the random process in the
state space system. C should be n x j
beta : scalar(float)
3.7. ROBUSTNESS
429
3.7. ROBUSTNESS
430
r"""
The B operator, mapping P into
.. math::
B(P) := R - beta^2 A'PB(Q + beta B'PB)^{-1}B'PA + beta A'PA
and also returning
.. math::
F := (Q + beta B'PB)^{-1} beta B'PA
Parameters
---------P : array_like(float, ndim=2)
A matrix that should be n x n
Returns
------F : array_like(float, ndim=2)
The F matrix as defined above
new_p : array_like(float, ndim=2)
The matrix P after applying the B operator
"""
A, B, Q, R, beta = self.A, self.B, self.Q, self.R, self.beta
S1 = Q + beta * dot(B.T, dot(P, B))
S2 = beta * dot(B.T, dot(P, A))
S3 = beta * dot(A.T, dot(P, A))
F = solve(S1, S2)
new_P = R - dot(S2.T, solve(S1, S2)) + S3
return F, new_P
def robust_rule(self):
"""
This method solves the robust control problem by tricking it
into a stacked LQ problem, as described in chapter 2 of HansenSargent's text "Robustness." The optimal control with observed
state is
.. math::
u_t = - F x_t
And the value function is -x'Px
Returns
------F : array_like(float, ndim=2)
The optimal control matrix from above
P : array_like(float, ndim=2)
3.7. ROBUSTNESS
431
3.7. ROBUSTNESS
432
3.7. ROBUSTNESS
433
K : array_like(float, ndim=2)
A j x n array
Returns
------F : array_like(float, ndim=2)
The policy function for a given K
P : array_like(float, ndim=2)
The value function for a given K
"""
A1 = self.A + dot(self.C, K)
B1 = self.B
Q1 = self.Q
R1 = self.R - self.beta * self.theta * dot(K.T, K)
lq = LQ(Q1, R1, A1, B1, beta=self.beta)
P, F, d = lq.stationary_values()
return F, P
def compute_deterministic_entropy(self, F, K, x0):
"""
Given K and F, compute the value of deterministic entropy, which
is sum_t beta^t x_t' K'K x_t with x_{t+1} = (A - BF + CK) x_t.
Parameters
---------F : array_like(float, ndim=2)
The policy function, a k x n array
K : array_like(float, ndim=2)
The worst case matrix, a j x n array
x0 : array_like(float, ndim=1)
The initial condition for state
Returns
------e : scalar(int)
The deterministic entropy
"""
H0 = dot(K.T, K)
C0 = np.zeros((self.n, 1))
A0 = self.A - dot(self.B, F) + dot(self.C, K)
e = var_quadratic_sum(A0, C0, H0, self.beta, x0)
return e
def evaluate_F(self, F):
"""
Given a fixed policy F, with the interpretation u = -F x, this
function computes the matrix P_F and constant d_F associated
with discounted cost J_F(x) = x' P_F x + d_F.
434
3.7. ROBUSTNESS
Parameters
---------F : array_like(float, ndim=2)
The policy function, a k x n array
Returns
------P_F : array_like(float, ndim=2)
Matrix for discounted cost
d_F : scalar(float)
Constant for discounted cost
K_F : array_like(float, ndim=2)
Worst case policy
O_F : array_like(float, ndim=2)
Matrix for discounted entropy
o_F : scalar(float)
Constant for discounted entropy
"""
# == Simplify names == #
Q, R, A, B, C = self.Q, self.R, self.A, self.B, self.C
beta, theta = self.beta, self.theta
# == Solve for policies and costs using agent 2's problem == #
K_F, P_F = self.F_to_K(F)
I = np.identity(self.j)
H = inv(I - C.T.dot(P_F.dot(C)) / theta)
d_F = log(det(H))
# == Compute O_F and o_F == #
sig = -1.0 / theta
AO = sqrt(beta) * (A - dot(B, F) + dot(C, K_F))
O_F = solve_discrete_lyapunov(AO.T, beta * dot(K_F.T, K_F))
ho = (trace(H - 1) - d_F) / 2.0
tr = trace(dot(O_F, C.dot(H.dot(C.T))))
o_F = (ho + beta * tr) / (1 - beta)
return K_F, P_F, d_F, O_F, o_F
435
3.7. ROBUSTNESS
cussion
Application
Let us consider a monopolist similar to this one, but now facing model uncertainty
The inverse demand function is pt = a0 a1 yt + dt
where
dt+1 = dt + d wt+1 ,
iid
{wt } N (0, 1)
( y t +1 y t )2
cyt
2
0
b
0
R = b a1 1/2
0 1/2
0
1 0 0
A = 0 1 0 ,
0 0
and
Q = /2
0
B = 1 ,
0
0
C =0
d
436
3.7. ROBUSTNESS
The code for producing the graph shown above, with blue being for the robust policy, is given in
examples/robust_monopolist.py
We repeat it here for convenience
"""
Filename: robust_monopolist.py
Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski
The robust control problem for a monopolist with adjustment costs.
inverse demand curve is:
The
=
=
=
=
=
=
=
100
0.5
0.9
0.05
0.95
2
50.0
theta = 0.002
ac
= (a_0 - c) / 2.0
# == Define LQ matrices == #
R = np.array([[0., ac, 0.],
[ac, -a_1, 0.5],
[0., 0.5, 0.]])
437
3.7. ROBUSTNESS
R = -R # For minimization
Q = gamma / 2
A = np.array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., rho]])
B = np.array([[0.],
[1.],
[0.]])
C = np.array([[0.],
[0.],
[sigma_d]])
# -------------------------------------------------------------------------- #
#
Functions
# -------------------------------------------------------------------------- #
def evaluate_policy(theta, F):
"""
Given theta (scalar, dtype=float) and policy F (array_like), returns the
value associated with that policy under the worst case path for {w_t}, as
well as the entropy level.
"""
rlq = qe.robustlq.RBLQ(Q, R, A, B, C, beta, theta)
K_F, P_F, d_F, O_F, o_F = rlq.evaluate_F(F)
x0 = np.array([[1.], [0.], [0.]])
value = - x0.T.dot(P_F.dot(x0)) - d_F
entropy = x0.T.dot(O_F.dot(x0)) + o_F
return list(map(float, (value, entropy)))
def value_and_entropy(emax, F, bw, grid_size=1000):
"""
Compute the value function and entropy levels for a theta path
increasing until it reaches the specified target entropy value.
Parameters
==========
emax: scalar
The target entropy value
F: array_like
The policy function to be evaluated
bw: str
A string specifying whether the implied shock path follows best
or worst assumptions. The only acceptable values are 'best' and
'worst'.
Returns
=======
df: pd.DataFrame
438
3.7. ROBUSTNESS
A pandas DataFrame containing the value function and entropy
values up to the emax parameter. The columns are 'value' and
'entropy'.
"""
if bw == 'worst':
thetas = 1 / np.linspace(1e-8, 1000, grid_size)
else:
thetas = -1 / np.linspace(1e-8, 1000, grid_size)
df = pd.DataFrame(index=thetas, columns=('value', 'entropy'))
for theta in thetas:
df.ix[theta] = evaluate_policy(theta, F)
if df.ix[theta, 'entropy'] >= emax:
break
df = df.dropna(how='any')
return df
# -------------------------------------------------------------------------- #
#
Main
# -------------------------------------------------------------------------- #
# == Compute the optimal rule == #
optimal_lq = qe.lqcontrol.LQ(Q, R, A, B, C, beta)
Po, Fo, do = optimal_lq.stationary_values()
# == Compute a robust rule given theta == #
baseline_robust = qe.robustlq.RBLQ(Q, R, A, B, C, beta, theta)
Fb, Kb, Pb = baseline_robust.robust_rule()
# == Check the positive definiteness of worst-case covariance matrix to == #
# == ensure that theta exceeds the breakdown point == #
test_matrix = np.identity(Pb.shape[0]) - np.dot(C.T, Pb.dot(C)) / theta
eigenvals, eigenvecs = eig(test_matrix)
assert (eigenvals >= 0).all(), 'theta below breakdown point.'
emax = 1.6e6
optimal_best_case = value_and_entropy(emax, Fo, 'best')
robust_best_case = value_and_entropy(emax, Fb, 'best')
optimal_worst_case = value_and_entropy(emax, Fo, 'worst')
robust_worst_case = value_and_entropy(emax, Fb, 'worst')
fig, ax = plt.subplots()
ax.set_xlim(0, emax)
ax.set_ylabel("Value")
ax.set_xlabel("Entropy")
3.7. ROBUSTNESS
439
ax.grid()
for axis in 'x', 'y':
plt.ticklabel_format(style='sci', axis=axis, scilimits=(0, 0))
plot_args = {'lw': 2, 'alpha': 0.7}
colors = 'r', 'b'
df_pairs = ((optimal_best_case, optimal_worst_case),
(robust_best_case, robust_worst_case))
class Curve(object):
def __init__(self, x, y):
self.x, self.y = x, y
def __call__(self, z):
return interp(z, self.x, self.y)
for c, df_pair in zip(colors, df_pairs):
curves = []
for df in df_pair:
# == Plot curves == #
x, y = df['entropy'], df['value']
x, y = (np.asarray(a, dtype='float') for a in (x, y))
egrid = np.linspace(0, emax, 100)
curve = Curve(x, y)
print(ax.plot(egrid, curve(egrid), color=c, **plot_args))
curves.append(curve)
# == Color fill between curves == #
ax.fill_between(egrid,
curves[0](egrid),
curves[1](egrid),
color=c, alpha=0.1)
plt.show()
) = K,
We sketch the proof only of the first claim in this section, which is that, for any given , K ( F,
where K is as given in (3.61)
This is the content of the next lemma
440
3.7. ROBUSTNESS
Lemma. If P is the fixed point of the map B D and F is the robust policy as given in (3.60), then
) = ( I C 0 PC
)1 C 0 P ( A B F )
K ( F,
(3.81)
441
Overview
Previous lectures including the linear regulator and the rational expectations equilibrium lectures have
studied decision problems that are recursive in what we can call natural state variables, such as
stocks of capital (fiscal, financial and human)
wealth
information that helps forecast future prices and quantities that impinge on future payoffs
In problems that are recursive in the natural state variables, optimal decision rules are functions
of the natural state variables
In this lecture, we describe an important class of problems that are not recursive in the natural
state variables
Kydland and Prescott [KP77], [Pre77] and Calvo [Cal78] gave examples of such decision problems,
which have the following features
The time t 0 actions of some decision makers depend on the time s t decisions of
another decision maker called a government or a leader
At time t = 0, the government or leader chooses his actions for all times s 0
In this sense, the government or leader is said to commit to a plan
In these problems, variables that encode history dependence appear in optimal decision rules of the
government or leader
Furthermore, there are distinct optimal decision rules for time t = 0, on the one hand, and times
t 1, on the other hand
The decision rules for t = 0 and t 1 have distinct state variables
This property of the decision rules is called the time inconsistency of optimal plans
442
An expression of time inconsistency is that optimal decision rules are not recursive in natural state
variables
Examples of time inconsistent optimal rules are those of a large agent (e.g., a government) who
confronts a competitive market composed of many small private agents, and in which
the private agents decisions at each date are influenced by their forecasts of the governments
future actions
In such settings, the natural state variables of private agents at time t are partly shaped by past
decisions that were influenced by their earlier forecasts of the governments action at time t
The rational expectations equilibrium concept plays an essential role
It means that in choosing its future actions, the government or leader effectively chooses the followers expectations about them
The government or leader realizes and exploits that fact
In a rational expectations equilibrium, the government must confirm private agents earlier forecasts of the governments time t actions
The requirement to confirm prior private sector forecasts puts constraints on the governments
time t decisions that prevent its problem from being recursive in natural state variables
These additional constraints make the governments decision rule at t depend on the entire history
of the natural state variables from time 0 to time t
An important lesson to be taught in this lecture is that if the natural state variables are augmented
with additional state variables that measure costs in terms of the governments current continuation value of now confirming past private sector expectations about its current behavior, this class
of problems can be made recursive
This fact affords significant computational advantages and yields substantial insights
This lecture displays these principles within the tractable framework of linear quadratic problems
It is based on chapter 19 of [LS12]
The Stackelberg problem
We use the optimal linear regulator to solve a linear quadratic version of what is known as a
dynamic Stackelberg problem
For now we refer to the Stackelberg leader as the government and the Stackelberg follower as the
representative agent or private sector
Soon well give an application with another interpretation of these two decision makers
Let zt be an nz 1 vector of natural state variables, xt an n x 1 vector of endogenous variables
free to jump at t, and ut a vector of government instruments
The zt vector is inherited from the past
But xt is not inherited from the past
T HOMAS S ARGENT AND J OHN S TACHURSKI
443
(3.83)
Subject to an initial condition for z0 , but not for x0 , a government wants to maximize
t r (yt , ut )
(3.84)
t =0
A 12
A 22
zt
t
+ Bu
xt
(3.85)
We assume that the matrix on the left is invertible, so that we can multiply both sides of the above
equation by its inverse to obtain
z t +1
A11 A12 zt
=
+ But
(3.86)
x t +1
A21 A22 xt
or
yt+1 = Ayt + But
(3.87)
The problem assumes that there are no cross products between states and controls in the return function. There is
a simple transformation that converts a problem whose return function has cross products into an equivalent problem
that has no cross products. For example, see :citeHansenSargent2008 (chapter 4, pp. 72-73).
444
We now describe a handy four-step algorithm for solving the Stackelberg problem
Step 1: solve an optimal linear regulator Step 1 seems to disregard the forward-looking aspect
of the problem (step 3 will take account of that)
z
If we temporarily ignore the fact that the x0 component of the state y0 = 0 is not actually part
x0
of the true state vector, then superficially the Stackelberg problem (3.84), (3.87) has the form of an
optimal linear regulator problem
It can be solved by forming a Bellman equation and iterating until it converges
2
The government would make different choices were it to choose sequentially, that is, were it to select its time t
action at time :matht. See the lecture on history dependent policies
445
The optimal value function has the form v(y) = y0 Py, where P satisfies the Riccati equation
(3.92)
The next steps note how the value function v(y) = y0 Py encodes objects that solve the Stackelberg problem, then tell how to decode them
A reader not wanting to be reminded of the details of the Bellman equation can now move directly
to step 2
For those wanting a reminder, here it is
The linear regulator is
v(y0 ) = y00 Py0 =
max
{ut ,yt+1 }
t =0
(3.88)
t =0
where the maximization is subject to a fixed initial condition for y0 and the law of motion 3
yt+1 = Ayt + But
(3.89)
(3.90)
y = Ay + Bu
(3.91)
where y denotes next periods value of the state. Problem (3.90) gives rise to the matrix Riccati
equation
(3.92)
P = R + A0 PA 2 A0 PB( Q + B0 PB)1 B0 PA
and the formula for F in the decision rule ut = Fyt
F = ( Q + B0 PB)1 B0 PA
(3.93)
Thus, we can solve problem (3.90), (3.87) by iterating to convergence on the difference equation
counterpart to the algebraic Riccati equation (3.92)
Step 2: use the stabilizing properties of shadow price Pyt At this point, we decode the information in the matrix P in terms of shadow prices that are associated with a Lagrangian
At this point we note that we can solve a linear quadratic control problem of the form (3.84), (3.87)
by attaching a sequence of Lagrange multipliers 2t+1 t+1 to the sequence of constraints (3.87)
and then forming the Lagrangian:
L = t y0t Ryt + u0t Qut + 20t+1 ( Ayt + But yt+1 )
(3.94)
t =0
For the
Stackelberg problem,
it is important to partition t conformably with our partition of
z
yt = t , so that t = zt
xt
xt
3
In step 4, we acknowledge that the x0 component is not given but is to be chosen by the Stackelberg leader.
446
(3.95)
(3.96)
y t +1
y
=N t
t +1
t
(3.97)
t y0t yt < +
t =0
Stabilizing solution A stabilizing solution satisfies 0 = Py0 , where P solves the matrix Riccati
equation (3.92)
The solution for 0 replicates itself over time in the sense that
t = Pyt
(3.98)
Appendix A verifies that the matrix P that satisfies the Riccati equation (3.92) is the same P that
defines the stabilizing initial conditions (y0 , Py0 )
Step 3: convert implementation multipliers into state variables
Key insight We now confront the fact that the x0 component of y0 consists of variables that are
not state variables
This means that they are not inherited from the past but are to be determined at time t
By way of contrast, in the optimal linear regulator problem, y0 is a state vector inherited from the
past; the multiplier 0 jumps at t to satisfy 0 = Py0 and thereby stabilize the system
But in the Stackelberg problem, pertinent components of both y0 and 0 must adjust to satisfy
0 = Py0
447
0
In particular, partition t conformably with the partition of yt into z0t xt0 to get 4
t =
zt
xt
For the Stackelberg problem, the first nz elements of yt are predetermined but the remaining components are free
And while the first nz elements of t are free to jump at t, the remaining components are not
The third step completes the solution of the Stackelberg problem by acknowledging these facts
After we have performed the key step of computing the matrix P that solves the Riccati equation
(3.92), we convert the last n x Lagrange multipliers xt into state variables by using the following
procedure
Write the last n x equations of (3.98) as
xt = P21 zt + P22 xt ,
(3.99)
0
where the partitioning of P is conformable with that of yt into zt xt
The vector xt becomes part of the state at t, while xt is free to jump at t
Therefore, we solve (3.99) for xt in terms of (zt , xt ):
1
1
xt = P22
P21 zt + P22
xt
(3.100)
I
0
zt
zt
yt =
=
1
1
xt
xt
P22 P21 P22
(3.101)
xt = P21 P22 yt
(3.102)
and from
With these modifications, the key formulas (3.93) and (3.92) from the optimal linear regulator for
F and P, respectively, continue to apply
Using (3.101), the optimal decision rule is
ut = F
I
1
P22
P21
0
1
P22
zt
xt
(3.103)
This argument just adapts one in [Pea92]. The Lagrangian associated with the Stackelberg problem remains (3.94),
which means that the stabilizing solution must satisfy (3.98). It is only in how we impose (3.98) that the solution
diverges from that for the linear regulator.
448
(3.104)
2P21 z0 2P22 x0 = 0,
which by virtue of (3.99) is equivalent with
x0 = 0
(3.105)
(3.106)
The Lagrange multiplier xt measures the cost to the Stackelberg leader at t 0 of confirming
expectations about its time t action that the followers had held at dates s < t
Setting x0 = 0 means that at time 0 there are no such prior expectations to confirm
Notice the important role of the rational expectations equilbrium concept here
Summary We solve the Stackelberg problem by
formulating a particular optimal linear regulator,
solving the associated matrix Riccati equation (3.92) for P,
computing F,
then partitioning P to obtain representation (??)
History-dependent representation of decision rule For some purposes, it is useful to eliminate
the implementation multipliers xt and to express the decision rule for ut as a function of zt , zt1 ,
and ut1
This can be accomplished as follows 5 . First represent (??) compactly as
z t +1
m11 m12
zt
=
x,t+1
m21 m22 xt
(3.107)
(3.108)
1
1
Then where f 12
denotes the generalized inverse of f 12 , (3.108) implies x,t = f 12
(ut f 11 zt )
Equate the right side of this expression to the right side of the second line of (3.107) lagged once
and rearrange by using (3.108) lagged once and rearrange by using lagged once to eliminate x,t1
to get
1
1
ut = f 12 m22 f 12
ut1 + f 11 zt + f 12 (m21 m22 f 12
f 11 )zt1
(3.109)
5
449
or
ut = ut1 + 0 zt + 1 zt1
(3.110)
for t 1
For t = 0, the initialization x,0 = 0 implies that
u0 = f 11 z0
(3.111)
By making the instrument feed back on itself, the form of potentially allows for instrumentsmoothing to emerge as an optimal rule under commitment
Please notice how the decision rule for t 1 differs from the decision rule for t = 0 because of the
presence in general of a nonzero xt in the decision rule (3.108) for t 1
As indicated at the beginning of this lecture, this is a symptom of the time inconsistency of the
optimal plan
A large firm with a competitive fringe
As an example, this section studies the equilibrium of an industry with a large firm that acts as a
Stackelberg leader with respect to a competitive fringe
Sometimes the large firm is called the monopolist even though there are actually many firms in
the industry
The industry produces a single nonstorable homogeneous good, the quantity of which is chosen
in the previous period
One large firm produces Qt and a representative firm in a competitive fringe produces qt
The representative firm in the competitive fringe acts as a price taker and chooses sequentially
The large firm commits to a policy at time 0, taking into account its ability to manipulate the price
sequence, both directly through the effects of its quantity choices on prices, and indirectly through
the responses of the competitive fringe to its forecasts of prices 6
The costs of production are Ct = eQt + .5gQ2t + .5c( Qt+1 Qt )2 for the large firm and t = dqt +
.5hq2t + .5c(qt+1 qt )2 for the competitive firm, where d > 0, e > 0, c > 0, g > 0, h > 0 are cost
parameters
There is a linear inverse demand curve
p t = A0 A1 ( Q t + q t ) + v t ,
(3.112)
(3.113)
and where || < 1 and e t+1 is an IID sequence of random variables with mean zero and variance 1
In (3.112), qt is equilibrium output of the representative competitive firm
In equilibrium, qt = qt , but we must distinguish between qt and qt in posing the optimum problem
of a competitive firm
6
[HS08] (chapter 16), uses this model as a laboratory to illustrate an equilibrium concept featuring robustness in
which at least one of the agents has doubts about the stochastic specification of the demand shock process.
450
E0 t { pt qt t } ,
(0, 1)
(3.114)
t =0
(3.115)
for t 0
We appeal to a certainty equivalence principle to justify working with a non-stochastic version of
(3.115) formed by dropping the expectation operator and the random term e t+1 from (3.113)
We use a method of [Sar79] and [Tow83] 7
We shift (3.112) forward one period, replace conditional expectations with realized values, use
(3.112) to substitute for pt+1 in (3.115), and set qt = qt for all t 0 to get
it = it+1 c1 hqt+1 + c1 ( A0 d) c1 A1 qt+1 c1 A1 Qt+1 + c1 vt+1
(3.116)
Given sufficiently stable sequences { Qt , vt }, we could solve (3.116) and it = qt+1 qt to express
the competitive fringes output sequence as a function of the (tail of the) monopolists output
sequence
The dependence of it on future Qt s opens an avenue for the monopolist to influence current
outcomes by its choice now of its future actions
It is this feature that makes the monopolists problem fail to be recursive in the natural state variables q, Q
The monopolist arrives at period t > 0 facing the constraint that it must confirm the expectations
about its time t decision upon which the competitive fringe based its decisions at dates before t
The monopolists problem The monopolist views the competitive firms sequence of Euler
equations as constraints on its own opportunities
They are implementability constraints on the monopolists choices
Including the implementability constraints, we can represent the constraints in terms of the transition law impinging on the monopolist:
where ut = Qt+1 Qt is the control of the monopolist
The last row portrays the implementability constraints (3.116)
7
They used this method to compute a rational expectations competitive equilibrium. The key step was to eliminate
price and output by substituting from the inverse demand curve and the production function into the firms first-order
conditions to get a difference equation in capital.
451
Represent (??) as
yt+1 = Ayt + But
(3.117)
Although we have entered the competitive fringes choice variable it as a component of the state
yt in the monopolists transition law (3.117), it is actually a jump variable
Nevertheless, the analysis above implies that the solution of the large firms problem is encoded
in the Riccati equation associated with (3.117) as the transition law
Lets decode it
To match our general setup, we partition yt as y0t = z0t xt0 where z0t = 1 vt Qt qt and xt = it
The monopolists problem is
max
t { pt Qt Ct }
,i }
t
t =0
subject to the given initial condition for z0 , equations (3.112) and (3.116) and it = qt+1 qt , as well
as the laws of motion of the natural state variables z
Notice that the monopolist in effect chooses the price sequence, as well as the quantity sequence
of the competitive fringe, albeit subject to the restrictions imposed by the behavior of consumers,
as summarized by the demand curve (3.112) and the implementability constraint (3.116) that describes the best responses of firms in the competitive fringe
By substituting (3.112) into the above objective function, the monopolists problem can be expressed as
max t ( A0 A1 (qt + Qt ) + vt ) Qt eQt .5gQ2t .5cu2t
(3.118)
{ u t } t =0
subject to (3.117)
This can be written
max t y0t Ryt + u0t Qut
{ut }
(3.119)
t =0
0
0
R = A02e
0
0
and Q =
0
0
A0 e
2
1
2
0
0
A1
1
.5g
1
2
2
0
A21
0
0
0
0
0
0
0
0
c
2
Equilibrium representation We can use (??) to represent the solution of the monopolists problem in the form:
z t +1
m11 m12
zt
=
(3.120)
x,t+1
m21 m22 x,t
or
z t +1
zt
=m
x,t+1
x,t
(3.121)
January 28, 2015
452
The monopolist is constrained to set x,0 0, but will find it optimal to set it to zero
0
Recall that zt = 1 vt Qt qt
Thus, (3.121) includes the equilibrium law of motion for the quantity qt of the competitive fringe
By construction, qt satisfies the Euler equation of the representative firm in the competitive fringe
Numerical example We computed the optimal Stackelberg plan for parameter settings
A0 , A1 , , Ce , c, d, e, g, h, = 100, 1, .8, .2, 1, 20, 20, .2, .2, .95 8
For these parameter values the decision rule is
zt
xt
(3.122)
6.9509
19.7827
0.0678
0.1885
ut = 0.44ut1 +
0.6403 zt + 0.3030 zt1
0.0550
0.1510
(3.123)
Note how in representation (3.122) the monopolists decision for ut = Qt+1 Qt feeds back negatively on the implementation multiplier 9
Concluding remarks
This lecture is our first encounter with a class of problems in which optimal decision rules are
history dependent 10
We shall encounter another example in the lecture on history dependent policies
There are many more such problems - see chapters 20-24 of [LS12]
Appendix A: The stabilizing t = Pyt
We verify that the P associated with the stabilizing 0 = Py0 satisfies the Riccati equation associated with the Bellman equation
Substituting t = Pyt into (3.95) and (??) gives
(3.124)
These calculations were performed by the Python program from QuantEcon in examples/oligopoly.py.
We also computed impulse responses to the demand innovation et . The impulse responses show that a demand
innovation pushes the implementation multiplier down and leads the monopolist to expand output while the representative competitive firm contracts output in subsequent periods. The response of price to a demand shock innovation is
to rise on impact but then to decrease in subsequent periods in response to the increase in total supply q + Q engineered
by the monopolist.
10 For another application of the techniques in this lecture and how they related to the method recommended by
[KP80b], please see :docthis lecture <hist_dep_policies> .
9
453
(3.125)
yt+1 = ( A BF )yt
(3.126)
F = ( Q + B0 PB)1 B0 PA
(3.127)
(3.128)
For the right side of (3.128) to agree with the right side of (3.124) for any initial value of y0 requires
that
P = R + A0 PA 2 A0 PB( Q + B0 PB)1 B0 PA
(3.129)
Equation (3.129) is the algebraic matrix Riccati equation associated with the optimal linear regulator for the system A, B, Q, R
Appendix B: Forecasting formulas
The decision rule for the competitive fringe incorporates forecasts of future prices from (3.121)
under m
Thus, the representative competitive firm uses equation (3.121) to forecast future values of ( Qt , qt )
in order to forecast pt
The representative competitive firms forecasts are generated from the j th iterate of (3.121) 11 :
zt+ j
zt
j
=m
(3.130)
x,t+ j
x,t
The following calculation verifies that the representative firm forecasts by iterating the law of
motion associated with m
Write the Euler equation for it (3.115) in terms of a polynomial in the lag operator L and factor it
to get
(1 ( 1 + (1 + c1 h)) L + 1 L2 ) = ( )1 L(1 L1 )(1 L)
where (0, 1) and = 1 when h = 0 12
By taking the nonstochastic version of (3.115) and solving an unstable root forward and a stable
root backward using the technique of Sargent (1979 or 1987a, chap. IX), we obtain
it = ( 1)qt + c1 ( ) j pt+ j ,
(3.131)
j =1
or
(3.132)
j =1
11
12
454
e p m( I m)
zt
xt
(3.133)
where e p = ( A0 d) 1 A1 A1 0 is a vector that forms pt d upon postmultiplication
zt
by
xt
It can be verified that the solution procedure builds in (3.133) as an identity, so that (3.133) agrees
with
1
1
it = P22
P21 zt + P22
xt
(3.134)
Exercises
(3.135)
(3.136)
.95t ( pt p)2 + u2t + .00001m2t
(3.137)
t =0
455
Code can be found in the file lqcontrol.py from the QuantEcon package that implements the
optimal linear regulator
Exercise 2 A representative consumer has quadratic utility functional
.5(b ct )2
(3.138)
t =0
(3.139)
where
at is the households holdings of an asset at the beginning of t
r > 0 is a constant net interest rate satisfying (1 + r ) < 1
yt is the consumers endowment at t
t 2
The consumers plan for (ct , at+1 ) has to obey the boundary condition
t =0 a t < +
t 1,
(3.140)
.5(ct b)2 t2
(3.141)
t =0
over {ct , t }
t=0 subject to the implementability constraints in (3.139) for t 0 and
t = (1 + r ) t +1
(3.142)
for t 0, where t (b ct )
a. Argue that (3.142) is the Euler equation for a consumer who maximizes (3.138) subject to (3.139),
taking {t } as a given sequence
b. Formulate the planners problem as a Stackelberg problem
c. For = .95, b = 30, (1 + r ) = .95, formulate an artificial optimal linear regulator problem and
use it to solve the Stackelberg problem
d. Give a recursive representation of the Stackelberg plan for t
T HOMAS S ARGENT AND J OHN S TACHURSKI
456
Overview
In this lecture we study covariance stationary linear stochastic processes, a class of models routinely used to study economic and financial time series
This class has the advantange of being
1. simple enough to be described by an elegant and comprehensive theory
2. relatively broad in terms of the kinds of dynamics it can represent
We consider these models in both the time and frequency domain
ARMA Processes We will focus much of our attention on linear covariance stationary models
with a finite number of parameters
In particular, we will study stationary ARMA processes, which form a cornerstone of the standard
theory of time series analysis
Its well known that every ARMA processes can be represented in linear state space form
However, ARMA have some important structure that makes it valuable to study them separately
Spectral Analysis Analysis in the frequency domain is also called spectral analysis
In essence, spectral analysis provides an alternative representation of the autocovariance of a covariance stationary process
Having a second representation of this important object
shines new light on the dynamics of the process in question
allows for a simpler, more tractable representation in certain important cases
The famous Fourier transform and its inverse are used to map between the two representations
Other Reading For supplementary reading, see
[LS12], chapter 2
457
[Sar87], chapter 11
John Cochranes notes on time series analysis, chapter 8
[Shi95], chapter 6
[CC08], all
Introduction
458
Example 2: General Linear Processes From the simple building block provided by white noise,
we can construct a very flexible family of covariance stationary processes the general linear
processes
Xt =
j et j ,
tZ
(3.143)
j =0
where
{et } is white noise
2
{t } is a square summable sequence in R (that is,
t=0 t < )
( k ) = 2 j j+k
(3.144)
j =0
By the Cauchy-Schwartz inequality one can show that the last expression is finite. Clearly it does
not depend on t
Wolds Decomposition Remarkably, the class of general linear processes goes a long way towards describing the entire class of zero-mean covariance stationary processes
In particular, Wolds theorem states that every zero-mean covariance stationary process { Xt } can
be written as
Xt =
j et j + t
j =0
where
{et } is white noise
{t } is square summable
t can be expressed as a linear function of Xt1 , Xt2 , . . . and is perfectly predictable over
arbitrarily long horizons
For intuition and further discussion, see [Sar87], p. 286
AR and MA General linear processes are a very broad class of processes, and it often pays to
specialize to those for which there exists a representation having only finitely many parameters
(In fact, experience shows that models with a relatively small number of parameters typically
perform better than larger models, especially for forecasting)
One very simple example of such a model is the AR(1) process
Xt = Xt1 + et
where
(3.145)
j
By direct substitution, it is easy to verify that Xt =
j =0 e t j
459
Applying (3.144) to the previous expression for Xt , we get the AR(1) autocovariance function
(k ) = k
2
,
1 2
k = 0, 1, . . .
(3.146)
The next figure plots this function for = 0.8 and = 0.8 with = 1
(1) = 2 ,
and
(k) = 0
k > 1
The AR(1) can be generalized to an AR(p) and likewise for the MA(1)
Putting all of this together, we get the
ARMA Processes A stochastic process { Xt } is called an autoregressive moving average process, or
ARMA(p, q), if it can be written as
Xt = 1 Xt1 + + p Xt p + et + 1 et1 + + q etq
(3.147)
460
algebraic manipulations treating the lag operator as an ordinary scalar often are legitimate
Using L, we can rewrite (3.147) as
L0 Xt 1 L1 Xt p L p Xt = L0 et + 1 L1 et + + q Lq et
(3.148)
and
( z ) : = 1 + 1 z + + q z q
(3.149)
(3.150)
In what follows we always assume that the roots of the polynomial (z) lie outside the unit circle
in the complex plane
This condition is sufficient to guarantee that the ARMA(p, q) process is convariance stationary
In fact it implies that the process falls within the class of general linear processes described above
That is, given an ARMA(p, q) process { Xt } satisfying the unit circle condition, there exists a square
summable sequence {t } with Xt =
j=0 j et j for all t
The sequence {t } can be obtained by a recursive procedure outlined on page 79 of [CC08]
In this context, the function t 7 t is often called the impulse response function
Spectral Analysis
Autocovariance functions provide a great deal of infomation about covariance stationary processes
In fact, for zero-mean Gaussian processes, the autocovariance function characterizes the entire
joint distribution
Even for non-Gaussian processes, it provides a significant amount of information
It turns out that there is an alternative representation of the autocovariance function of a covariance stationary process, called the spectral density
At times, the spectral density is easier to derive, easier to manipulate and provides additional
intuition
Complex Numbers Before discussing the spectral density, we invite you to recall the main properties of complex numbers (or skip to the next section)
It can be helpful to remember that, in a formal sense, complex numbers are just points ( x, y) R2
endowed with a specific notion of multiplication
When ( x, y) is regarded as a complex number, x is called the real part and y is called the imaginary
part
The modulus or absolute value of a complex number z = ( x, y) is just its Euclidean norm in R2 , but
is usually written as |z| instead of kzk
T HOMAS S ARGENT AND J OHN S TACHURSKI
461
The product of two complex numbers ( x, y) and (u, v) is defined to be ( xu vy, xv + yu), while
addition is standard pointwise vector addition
When endowed with these notions of multiplication and addition, the set of complex numbers
forms a field addition and multiplication play well together, just as they do in R
The complex number ( x, y) is often written as x + iy, where i is called the imaginary unit, and is
understood to obey i2 = 1
The x + iy notation can be thought of as an easy way to remember the definition of multiplication
given above, because, proceeding naively,
(Some authors normalize the expression on the right by constants such as 1/ the chosen
convention makes little difference provided you are consistent)
Using the fact that is even, in the sense that (t) = (t) for all t, you should be able to show
that
f ( ) = (0) + 2 (k ) cos(k )
(3.151)
k 1
462
(White light has this property when frequency refers to the visible spectrum, a connection that
provides the origins of the term white noise)
Example 2: AR and :indexMA and ARMA It is an exercise to show that the MA(1) process
Xt = et1 + et has spectral density
f ( ) = 2 (1 + 2 cos( ) + 2 )
(3.152)
With a bit more effort, its possible to show (see, e.g., p. 261 of [Sar87]) that the spectral density of
the AR(1) process Xt = Xt1 + et is
f ( ) =
2
1 2 cos( ) + 2
(3.153)
More generally, it can be shown that the spectral density of the ARMA process (3.147) is
(ei ) 2 2
f ( ) =
(ei )
(3.154)
where
is the standard deviation of the white noise process {et }
the polynomials () and () are as defined in (3.149)
The derivation of (3.154) uses the fact that convolutions become products under Fourier transformations
The proof is elegant and can be found in many places see, for example, [Sar87], chapter 11,
section 4
Its a nice exercise to verify that (3.152) and (3.153) are indeed special cases of (3.154)
Interpreting the Spectral Density Plotting (3.153) reveals the shape of the spectral density for
the AR(1) model when takes the values 0.8 and -0.8 respectively
These spectral densities correspond to the autocovariance functions for the AR(1) process shown
above
Informally, we think of the spectral density as being large at those [0, ] such that the autocovariance function exhibits significant cycles at this frequency
To see the idea, lets consider why, in the lower panel of the preceding figure, the spectral density
for the case = 0.8 is large at =
Recall that the spectral density can be expressed as
f ( ) = (0) + 2 (k ) cos(k ) = (0) + 2 (0.8)k cos(k )
k 1
(3.155)
k 1
When we evaluate this at = , we get a large number because cos(k ) is large and positive
when (0.8)k is positive, and large in absolute value and negative when (0.8)k is negative
463
464
Hence the product is always large and positive, and hence the sum of the products on the righthand side of (3.155) is large
These ideas are illustrated in the next figure, which has k on the horizontal axis (click to enlarge)
On the other hand, if we evaluate f ( ) at = /3, then the cycles are not matched, the sequence
(k ) cos(k ) contains both positive and negative terms, and hence the sum of these terms is much
smaller
In summary, the spectral density is large at frequencies where the autocovariance function exhibits cycles
Inverting the Transformation We have just seen that the spectral density is useful in the sense
that it provides a frequency-based perspective on the autocovariance structure of a covariance
stationary process
Another reason that the spectral density is useful is that it can be inverted to recover the autocovariance function via the inverse Fourier transform
In particular, for all k Z, we have
(k) =
1
2
f ( )eik d
(3.156)
This is convenient in situations where the spectral density is easier to calculate and manipulate
than the autocovariance function
(For example, the expression (3.154) for the ARMA spectral density is much easier to work with
than the expression for the ARMA autocovariance)
Mathematical Theory This section is loosely based on [Sar87], p. 249-253, and included for those
who
T HOMAS S ARGENT AND J OHN S TACHURSKI
465
k hk
where
k := h f , hk i
(3.157)
product h g, hi = g( )h( )d
{ hk } = the orthonormal basis for L2 given by the set of trigonometric functions
eik
hk ( ) = , k Z, [, ]
2
Using the definition of T from above and the fact that f is even, we now have
T =
eik
1
(
k
)
= f ( )
2
2
k Z
(3.158)
In other words, apart from a scalar multiple, the spectral density is just an transformation of `2
under a certain linear isometry a different way to view
T HOMAS S ARGENT AND J OHN S TACHURSKI
466
Most code for working with covariance stationary models deals with ARMA models
Python code for studying ARMA models can be found in the tsa submodule of statsmodels
Since this code doesnt quite cover our needs particularly vis-a-vis spectral analysis weve
put together the module arma.py, which is part of QuantEcon package.
The module provides functions for mapping ARMA(p, q) models into their
1. impulse response function
2. simulated time series
3. autocovariance function
4. spectral density
In additional to individual plots of these entities, we provide functionality to generate 2x2 plots
containing all this information
In other words, we want to replicate the plots on pages 6869 of [LS12]
Heres an example corresponding to the model Xt = 0.5Xt1 + et 0.8et2
467
where
* :math:`\phi = (\phi_1, \phi_2,..., \phi_p)`
* :math:`\theta = (\theta_1, \theta_2,..., \theta_q)`
* :math:`\sigma` is a scalar, the standard deviation of the
white noise
Parameters
---------phi : scalar or iterable or array_like(float)
Autocorrelation values for the autocorrelated variable.
See above for explanation.
468
by scipy.signal to do the
with the phi values
by scipy.signal to do the
with the theta values
"""
def __init__(self, phi, theta=0, sigma=1):
self._phi, self._theta = phi, theta
self.sigma = sigma
self.set_params()
@property
def phi(self):
return self._phi
@phi.setter
def phi(self, new_value):
self._phi = new_value
self.set_params()
@property
def theta(self):
return self._theta
@theta.setter
def theta(self, new_value):
self._theta = new_value
self.set_params()
def set_params(self):
r"""
Internally, scipy.signal works with systems of the form
.. math::
ar_{poly}(L) X_t = ma_{poly}(L) \epsilon_t
where L is the lag operator. To match this, we set
.. math::
469
470
Parameters
---------two_pi : Boolean, optional
Compute the spectral density function over [0, pi] if
two_pi is False and [0, 2 pi] otherwise. Default value is
True
res : scalar or array_like(int), optional(default=1200)
If res is a scalar then the spectral density is computed at
`res` frequencies evenly spaced around the unit circle, but
if res is an array then the function computes the response at the
frequencies given by the array
Returns
------w : array_like(float)
The normalized frequencies at which h was computed, in
radians/sample
spect : array_like(float)
The frequency response
"""
w, h = freqz(self.ma_poly, self.ar_poly, worN=res, whole=two_pi)
spect = h * conj(h) * self.sigma**2
return w, spect
def autocovariance(self, num_autocov=16):
"""
Compute the autocovariance function from the ARMA parameters over the
integers range(num_autocov) using the spectral density and the inverse
Fourier transform.
Parameters
---------num_autocov : scalar(int), optional(default=16)
The number of autocovariances to calculate
"""
spect = self.spectral_density()[1]
acov = np.fft.ifft(spect).real
# num_autocov should be <= len(acov) / 2
return acov[:num_autocov]
def simulation(self, ts_length=90):
"""
Compute a simulated sample path assuming Gaussian shocks.
Parameters
---------ts_length : scalar(int), optional(default=90)
Number of periods to simulate for
471
Returns
------vals : array_like(float)
A simulation of the model that corresponds to this class
"""
sys = self.ma_poly, self.ar_poly, 1
u = np.random.randn(ts_length, 1) * self.sigma
vals = dlsim(sys, u)[1]
return vals.flatten()
def plot_impulse_response(self, ax=None, show=True):
if show:
fig, ax = plt.subplots()
ax.set_title('Impulse response')
yi = self.impulse_response()
ax.stem(list(range(len(yi))), yi)
ax.set_xlim(xmin=(-0.5))
ax.set_ylim(min(yi)-0.1, max(yi)+0.1)
ax.set_xlabel('time')
ax.set_ylabel('response')
if show:
plt.show()
def plot_spectral_density(self, ax=None, show=True):
if show:
fig, ax = plt.subplots()
ax.set_title('Spectral density')
w, spect = self.spectral_density(two_pi=False)
ax.semilogy(w, spect)
ax.set_xlim(0, pi)
ax.set_ylim(0, np.max(spect))
ax.set_xlabel('frequency')
ax.set_ylabel('spectrum')
if show:
plt.show()
def plot_autocovariance(self, ax=None, show=True):
if show:
fig, ax = plt.subplots()
ax.set_title('Autocovariance')
acov = self.autocovariance()
ax.stem(list(range(len(acov))), acov)
ax.set_xlim(-0.5, len(acov) - 0.5)
ax.set_xlabel('time')
ax.set_ylabel('autocovariance')
if show:
plt.show()
def plot_simulation(self, ax=None, show=True):
if show:
fig, ax = plt.subplots()
472
473
We also permit phi and theta to be scalars, in which case the model will be interpreted as
Xt = Xt1 + et + et1
The two numerical packages most useful for working with ARMA models are scipy.signal and
numpy.fft
The package scipy.signal expects the parameters to be passed in to its functions in a manner
consistent with the alternative ARMA notation (3.150)
For example, the impulse response sequence {t } discussed above can be obtained using
scipy.signal.dimpulse, and the function call should be of the form
times, psi = dimpulse((ma_poly, ar_poly, 1), n=impulse_length)
where ma_poly and ar_poly correspond to the polynomials in (3.149) that is,
ma_poly is the vector (1, 1 , 2 , . . . , q )
ar_poly is the vector (1, 1 , 2 , . . . , p )
To this end, we also maintain the arrays ma_poly and ar_poly as instance data, with their values
computed automatically from the values of phi and theta supplied by the user
If the user decides to change the value of either phi or theta ex-post by assignments such as
lp.phi = (0.5, 0.2)
lp.theta = (0, -0.1)
then ma_poly and ar_poly should update automatically to reflect these new parameters
This is achieved in our implementation by using Descriptors
Computing the Autocovariance Function As discussed above, for ARMA processes the spectral
density has a simple representation that is relatively easy to calculate
Given this fact, the easiest way to obtain the autocovariance function is to recover it from the
spectral density via the inverse Fourier transform
Here we use NumPys Fourier transform package np.fft, which wraps a standard Fortran-based
package called FFTPACK
A look at the np.fft documentation shows that the inverse transform np.fft.ifft takes a given sequence A0 , A1 , . . . , An1 and returns the sequence a0 , a1 , . . . , an1 defined by
ak =
1 n 1
At eik2t/n
n t
=0
1 2
1 n 1
f (t )eit k =
n t =0
2 n
n 1
f (t )eit k ,
t := 2t/n
t =0
474
1
2
Z 2
0
f ( )eik d =
1
2
f ( )eik d
Overview
Recall that the spectral density f of a covariance stationary process with autocorrelation function
can be written as
f ( ) = (0) + 2 (k ) cos(k ),
R
k 1
475
Now consider the problem of estimating the spectral density of a given time series, when is
unknown
In particular, let X0 , . . . , Xn1 be n consecutive observations of a single time series that is assumed
to be covariance stationary
The most common estimator of the spectral density of this process is the periodogram of
X0 , . . . , Xn1 , which is defined as
1
I ( ) :=
n
2
n 1
it
Xt e ,
t =0
(3.159)
n 1
n 1
1
Xt cos(t) + Xt sin(t)
I ( ) =
n t
=0
t =0
It is straightforward to show that the function I is even and 2-periodic (i.e., I ( ) = I ( ) and
I ( + 2 ) = I ( ) for all R)
From these two results, you will be able to verify that the values of I on [0, ] determine the values
of I on all of R
The next section helps to explain the connection between the periodogram and the spectral density
Interpretation To interpret the periodogram, it is convenient to focus on its values at the Fourier
frequencies
2j
j :=
, j = 0, . . . , n 1
n
In what sense is I ( j ) an estimate of f ( j )?
The answer is straightforward, although it does involve some algebra
With a bit of effort one can show that, for any integer j > 0,
n 1
t =0
it j
n 1
t
= exp i2j
n
t =0
=0
n 1
n 1 n 1
t =0
k =1 t = k
(Xt X )2 + 2
476
Now let
(k ) :=
1 n 1
( Xt X )( Xtk X ),
n t
=k
k = 0, 1, . . . , n 1
This is the sample autocovariance function, the natural plug-in estimator of the autocovariance
function
(Plug-in estimator is an informal term for an estimator found by replacing expectations with
sample means)
With this notation, we can now write
n 1
I ( j ) = (0) + 2
(k) cos( j k)
k =1
Recalling our expression for f given above, we see that I ( j ) is just a sample analog of f ( j )
Calculation Lets now consider how to compute the periodogram as defined in (3.159)
There are already functions available that will do this for us an example is
statsmodels.tsa.stattools.periodogram in the statsmodels package
However, it is very simple to replicate their results, and this will give us a platform to make useful
extensions
The most common way to calculate the periodogram is via the discrete Fourier transform, which
in turn is implemented through the fast Fourier transform algorithm
In general, given a sequence a0 , . . . , an1 , the discrete Fourier transform computes the sequence
n 1
tj
A j := at exp i2
n
t =0
,
j = 0, . . . , n 1
With numpy.fft.fft imported as fft and a0 , . . . , an1 stored in NumPy array a, the function call
fft(a) returns the values A0 , . . . , An1 as a NumPy array
It follows that, when the data X0 , . . . , Xn1 is stored in array X, the values I ( j ) at the Fourier
frequencies, which are given by
1
n
2
n 1
tj
Xt exp i2
,
t =0
n
j = 0, . . . , n 1
477
Lets generate some data for this function using the ARMA class from QuantEcon
(See the lecture on linear processes for details on this class)
Heres a code snippet that, once the preceding code has been run, generates data from the process
Xt = 0.5Xt1 + et 0.8et2
(3.160)
where {et } is white noise with unit variance, and compares the periodogram to the actual spectral
density
import matplotlib.pyplot as plt
from quantecon import ARMA
n = 40
phi, theta = 0.5, (0, -0.8)
lp = ARMA(phi, theta)
X = lp.simulation(ts_length=n)
# Data size
# AR and MA parameters
fig, ax = plt.subplots()
x, y = periodogram(X)
ax.plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')
x_sd, y_sd = lp.spectral_density(two_pi=False, resolution=120)
ax.plot(x_sd, y_sd, 'r-', lw=2, alpha=0.8, label='spectral density')
ax.legend()
plt.show()
478
This estimate looks rather disappointing, but the data size is only 40, so perhaps its not surprising
that the estimate is poor
However, if we try again with n = 1200 the outcome is not much better
The periodogram is far too irregular relative to the underlying spectral density
This brings us to our next topic
Smoothing
IS ( j ) :=
w(`) I ( j+` )
(3.161)
`= p
479
where the weights w( p), . . . , w( p) are a sequence of 2p + 1 nonnegative values summing to one
In generally, larger values of p indicate more smoothing more on this below
The next figure shows the kind of sequence typically used
Note the smaller weights towards the edges and larger weights in the center, so that more distant
values from I ( j ) have less weight than closer ones in the sum (3.161)
Estimation with Smoothing Our next step is to provide code that will not only estimate the
periodogram but also provide smoothing as required
Such functions have been written in estspec.py and are available via QuantEcon
The file estspec.py are printed below
"""
Filename: estspec.py
Authors: Thomas Sargent, John Stachurski
Functions for working with periodograms of scalar data.
"""
from __future__ import division, print_function
import numpy as np
from numpy.fft import fft
from pandas import ols, Series
def smooth(x, window_len=7, window='hanning'):
"""
Smooth the data in x using convolution with a window of requested
size and type.
Parameters
480
---------x : array_like(float)
A flat NumPy array containing the data to smooth
window_len : scalar(int), optional
An odd integer giving the length of the window. Defaults to 7.
window : string
A string giving the window type. Possible values are 'flat',
'hanning', 'hamming', 'bartlett' or 'blackman'
Returns
------array_like(float)
The smoothed values
Notes
----Application of the smoothing window at the top and bottom of x is
done by reflecting x around these points to extend it sufficiently
in each direction.
"""
if len(x) < window_len:
raise ValueError("Input vector length must be >= window length.")
if window_len < 3:
raise ValueError("Window length must be at least 3.")
if not window_len % 2: # window_len is even
window_len += 1
print("Window length reset to {}".format(window_len))
windows = {'hanning': np.hanning,
'hamming': np.hamming,
'bartlett': np.bartlett,
'blackman': np.blackman,
'flat': np.ones # moving average
}
# === Reflect x around x[0] and x[-1] prior to convolution === #
k = int(window_len / 2)
xb = x[:k]
# First k elements
xt = x[-k:] # Last k elements
s = np.concatenate((xb[::-1], x, xt[::-1]))
# === Select window values === #
if window in windows.keys():
w = windows[window](window_len)
else:
msg = "Unrecognized window type '{}'".format(window)
print(msg + " Defaulting to hanning")
w = windows['hanning'](window_len)
return np.convolve(w / w.sum(), s, mode='valid')
481
482
The listing displays three functions, smooth(), periodogram(), ar_periodogram(). We will discuss the first two here and the third one below
The periodogram() function returns a periodogram, optionally smoothed via the smooth() function
Regarding the smooth() function, since smoothing adds a nontrivial amount of computation, we
have applied a fairly terse array-centric method based around np.convolve
Readers are left to either explore or simply use this code according to their interests
The next three figures each show smoothed and unsmoothed periodograms, as well as the true
spectral density
(The model is the same as before see equation (3.160) and there are 400 observations)
From top figure to bottom, the window length is varied from small to large
In looking at the figure, we can see that for this model and data size, the window length chosen in
the middle figure provides the best fit
Relative to this value, the first window length provides insufficient smoothing, while the third
gives too much smoothing
Of course in real estimation problems the true spectral density is not visible and the choice of
appropriate smoothing will have to be made based on judgement/priors or some other theory
483
484
Pre-Filtering and Smoothing In the code listing above we showed three functions from the file
estspec.py
The third function in the file (ar_periodogram()) adds a pre-processing step to periodogram
smoothing
First we describe the basic idea, and after that we give the code
The essential idea is to
1. Transform the data in order to make estimation of the spectral density more efficient
2. Compute the periodogram associated with the transformed data
3. Reverse the effect of the transformation on the periodogram, so that it now estimates the
spectral density of the original process
Step 1 is called pre-filtering or pre-whitening, while step 3 is called recoloring
The first step is called pre-whitening because the transformation is usually designed to turn the
data into something closer to white noise
Why would this be desirable in terms of spectral density estimation?
The reason is that we are smoothing our estimated periodogram based on estimated values at
nearby points recall (3.161)
The underlying assumption that makes this a good idea is that the true spectral density is relatively regular the value of I ( ) is close to that of I ( 0 ) when is close to 0
This will not be true in all cases, but it is certainly true for white noise
For white noise, I is as regular as possible it is a constant function
In this case, values of I ( 0 ) at points 0 near to provided the maximum possible amount of
information about the value I ( )
Another way to put this is that if I is relatively constant, then we can use a large amount of
smoothing without introducing too much bias
The AR(1) Setting Lets examine this idea more carefully in a particular setting where the
data is assumed to be AR(1)
(More general ARMA settings can be handled using similar techniques to those described below)
Suppose in partcular that { Xt } is covariance stationary and AR(1), with
Xt+1 = + Xt + et+1
(3.162)
where and (1, 1) are unknown parameters and {et } is white noise
It follows that if we regress Xt+1 on Xt and an intercept, the residuals will approximate white
noise
Let
g be the spectral density of {et } a constant function, as discussed above
485
(3.163)
This suggests that the recoloring step, which constructs an estimate I of f from I0 , should set
2
1
I ( )
I ( ) =
i 0
1 e
where is the OLS estimate of
The code for ar_periodogram() the third function in estspec.py does exactly this. (See the
code here)
The next figure shows realizations of the two kinds of smoothed periodograms
1. standard smoothed periodogram, the ordinary smoothed periodogram, and
2. AR smoothed periodogram, the pre-whitened and recolored one generated by
ar_periodogram()
The periodograms are calculated from time series drawn from (3.162) with = 0 and = 0.9
Each time series is of length 150
The difference between the three subfigures is just randomness each one uses a different draw
of the time series
In all cases, periodograms are fit with the hamming window and window length of 65
Overall, the fit of the AR smoothed periodogram is much better, in the sense of being closer to the
true spectral density
Exercises
486
487
Solution notebook
Overview
488
The treatment given here closely follows this manuscript, prepared by Thomas J. Sargent and
Francois R. Velde
We cover only the key features of the problem in this lecture, leaving you to refer to that source
for additional results and intuition
Model Features
Linear quadratic (LQ) model
Representative household
Stochastic dynamic programming over an infinite horizon
Distortionary taxation
The Ramsey Problem
We begin by outlining the key assumptions regarding technology, households and the government
sector
Technology Labor can be converted one-for-one into a single, non-storable consumption good
In the usual spirit of the LQ model, the amount of labor supplied in each period is unrestricted
This is unrealistic, but helpful when it comes to solving the model
Realistic labor supply can be induced by suitable parameter values
Households Consider a representative household who chooses a path {`t , ct } for labor and consumption to maximize
1
E t (ct bt )2 + `2t
(3.164)
2 t =0
subject to the budget constraint
(3.165)
t =0
Here
is a discount factor in (0, 1)
p0t is state price at time t
bt is a stochastic preference parameter
dt is an endowment process
t is a flat tax rate on labor income
st is a promised time-t coupon payment on debt issued by the government
489
The budget constraint requires that the present value of consumption be restricted to the present
value of endowments, labor income and coupon payments on bond holdings
Government The government imposes a linear tax on labor income, fully committing to a
stochastic path of tax rates at time zero
The government also issues state-contingent debt
Given government tax and borrowing plans, we can construct a competitive equilibrium with
distorting government taxes
Among all such competitive equilibria, the Ramsey plan is the one that maximizes the welfare of
the representative consumer
Exogenous Variables Endowments, government expenditure, the preference parameter bt and
promised coupon payments on initial government debt st are all exogenous, and given by
d t = Sd x t
gt = S g x t
bt = S b x t
s t = Ss x t
The matrices Sd , Sg , Sb , Ss are primitives and { xt } is an exogenous stochastic process taking values
in Rk
We consider two specifications for { xt }
1. Discrete case: { xt } is a discrete state Markov chain with transition matrix P
2. VAR case: { xt } obeys xt+1 = Axt + Cwt+1 where {wt } is independent zero mean Gaussian
with identify covariance matrix
Feasibility The period-by-period feasibility restriction for this economy is
c t + gt = d t + ` t
(3.166)
E t p0t (st + gt t `t ) = 0
(3.167)
t =0
490
bt c t
b0 c0
and
t = 1
`t
bt c t
(3.168)
E t (bt ct )(st + gt `t ) + `2t = 0
(3.169)
t =0
The Ramsey problem now amounts to maximizing (3.164) subject to (3.169) and (3.166)
The associated Lagrangian is
1
t
2
2
2
L = E (ct bt ) + `t + (bt ct )(`t st gt ) `t + t [dt + `t ct gt ]
2
t =0
(3.170)
The first order conditions associated with ct and `t are
(ct bt ) + [`t + ( gt + st )] = t
and
`t [(bt ct ) 2`t ] = t
Combining these last two equalities with (3.166) and working through the algebra, one can show
that
`t = ` t mt and ct = ct mt
(3.171)
where
T HOMAS S ARGENT AND J OHN S TACHURSKI
491
:= /(1 + 2)
` t := (bt dt + gt )/2
ct := (bt + dt gt )/2
mt := (bt dt st )/2
Apart from , all of these quantities are expressed in terms of exogenous variables
To solve for , we can use the governments budget constraint again
The term inside the brackets in (3.169) is (bt ct )(st + gt ) (bt ct )`t + `2t
this term can be rewritten as
Using (3.171), the definitions above and the fact that ` = b c,
(bt ct )( gt + st ) + 2m2t (2 )
Reinserting into (3.169), we get
(
E
t (bt ct )( gt + st )
+ ( 2 )E
t =0
t 2m2t
=0
(3.172)
t =0
t (bt ct )( gt + st )
(
and
a0 : = E
t =0
t 2m2t
)
(3.173)
t =0
492
It follows that both of these expectations terms are special cases of the expression
q( x0 ) = E t xt0 Hxt
(3.174)
t =0
q ( x0 ) = E t h ( x t )
given
x0 = x j
t =0
q ( x0 ) =
t ( Pt h)[ j]
(3.175)
t =0
Here
Pt is the t-th power of the transition matrix P
h is, with some abuse of notation, the vector (h( x1 ), . . . , h( x N ))
( Pt h)[ j] indicates the j-th element of Pt h
It can be show that (3.175) is in fact equal to the j-th element of the vector ( I P)1 h
This last fact is applied in the calculations below
Other Variables We are interested in tracking several other variables besides the ones described
above
One is the present value of government obligations outstanding at time t, which can be expressed
as
(3.176)
j =0
Using our expression for prices and the Ramsey plan, we can also write Bt as
Bt = Et j
j =0
bt c t
January 28, 2015
493
Bt =
j =0
where
1
j t
R
tj : = Et pt+ j
Here Rtj can be thought of as the gross j-period risk-free rate on holding government debt between
t and j
Furthermore, letting Rt be the one-period risk-free rate, we define
t+1 := Bt+1 Rt [ Bt (t `t gt )]
and
t :=
s =0
The term t+1 is the payout on the publics portfolio of government debt
As shown in the original manuscript, if we distort one-step-ahead transition probabilities by
the adjustment factor
ptt+1
t :=
Et ptt+1
then t is a martingale under the distorted probabilities
See the treatment in the manuscript for more discussion and intuition
For now we will concern ourselves with computation
Implementation
494
495
Discount factor
Govt spending selector matrix
Exogenous endowment selector matrix
Utility parameter selector matrix
Coupon payments selector matrix
Discrete exogenous process (True or False)
Stochastic process parameters
Returns
========
path: a namedtuple of type 'Path', containing
g
- Govt spending
d
- Endowment
b
- Utility shift parameter
s
- Coupon payment on existing debt
c
- Consumption
l
- Labor
p
- Price
tau
- Tax rate
rvn
- Revenue
B
- Govt debt
R
- Risk free gross return
pi
- One-period risk-free interest rate
Pi
- Cumulative rate of return, adjusted
xi
- Adjustment factor for Pi
The corresponding values are flat numpy ndarrays.
"""
# == Simplify names == #
beta, Sg, Sd, Sb, Ss = econ.beta, econ.Sg, econ.Sd, econ.Sb, econ.Ss
if econ.discrete:
P, x_vals = econ.proc
else:
A, C = econ.proc
# == Simulate the exogenous process x == #
if econ.discrete:
state = mc_sample_path(P, init=0, sample_size=T)
x = x_vals[:, state]
else:
# == Generate an initial condition x0 satisfying x0 = A x0 == #
nx, nx = A.shape
x0 = nullspace((eye(nx) - A))
x0 = -x0 if (x0[nx-1] < 0) else x0
x0 = x0 / x0[nx-1]
# == Generate a time series x of length T starting from x0 == #
nx, nw = C.shape
x = zeros((nx, T))
w = randn(nw, T)
496
497
See
T = len(path.c)
# == Prepare axes == #
498
See
T = len(path.c)
# == Prepare axes == #
num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 10))
plt.subplots_adjust(hspace=0.5)
bbox = (0., 1.02, 1., .102)
499
Comments on the Code The function var_quadratic_sum imported from quadsums is for computing the value of (3.174) when the exogenous process { xt } is of the VAR type described above
Below the definition of the function, you will see definitions of two namedtuple objects, Economy
and Path
The first is used to collect all the parameters and primitives of a given LQ economy, while the
second collects output of the computations
In Python, a namedtuple is a popular data type from the collections module of the standard
library that replicates the functionality of a tuple, but also allows you to assign a name to each
tuple element
These elements can then be references via dotted attribute notation see for example the use of
path in the function gen_fig_1()
The benefits of using namedtuples:
Keeps content organized by meaning
Helps reduce the number of global variables
Other than that, our code is long but relatively straightforward
Examples
500
= 1/1.05
bt = 2.135 and st = dt = 0 for all t
Government spending evolves according to
gt+1 g = ( gt g ) + Cg w g,t+1
with = 0.7, g = 0.35 and Cg = g
1 2 /10
501
After running the code above, if you then execute lqramsey.gen_fig_2(path) from your IPython
shell you will produce the figure
See the original manuscript for comments and interpretation
The Discrete Case Our second example adopts a discrete Markov specification for the exogenous process
Heres the code, from file examples/lqramsey_discrete.py
"""
Filename: lqramsey_discrete.py
Authors: Thomas Sargent, Doc-Jin Jang, Jeong-hun Choi, John Stachurski
LQ Ramsey model with discrete exogenous process.
"""
from numpy import array
import lqramsey
# == Parameters == #
beta = 1 / 1.05
P = array([[0.8, 0.2, 0.0],
[0.0, 0.5, 0.5],
[0.0, 0.0, 1.0]])
# == Possible states of the world == #
# Each column is a state of the world. The rows are [g d b s 1]
x_vals = array([[0.5, 0.5, 0.25],
[0.0, 0.0, 0.0],
[2.2, 2.2, 2.2],
[0.0, 0.0, 0.0],
[1.0, 1.0, 1.0]])
Sg = array((1, 0, 0, 0, 0)).reshape(1, 5)
502
Sd = array((0, 1, 0, 0, 0)).reshape(1, 5)
Sb = array((0, 0, 1, 0, 0)).reshape(1, 5)
Ss = array((0, 0, 0, 1, 0)).reshape(1, 5)
economy = lqramsey.Economy(beta=beta,
Sg=Sg,
Sd=Sd,
Sb=Sb,
Ss=Ss,
discrete=True,
proc=(P, x_vals))
T = 15
path = lqramsey.compute_paths(T, economy)
lqramsey.gen_fig_1(path)
503
504
1 2
Solution notebook
Overview
This lecture describes history-dependent public policies and some of their representations
History dependent policies are decision rules that depend on the entire past history of the state
variables
History dependent policies naturally emerge in Ramsey problems
A Ramsey planner (typically interpreted as a government) devises a plan of actions at time t = 0
to follow at all future dates and for all contingencies
In order to make a plan, he takes as given Euler equations expressing private agents first-order
necessary conditions
He also takes into account that his future actions affect earlier decisions by private agents, an
avenue opened up by the maintained assumption of rational expectations
Another setting in which history dependent policies naturally emerge is where instead of a Ramsey planner there is a sequence of government administrators whose time t member takes as given
the policies used by its successors
T HOMAS S ARGENT AND J OHN S TACHURSKI
505
We study these ideas in the context of a model in which a benevolent tax authority is forced
to raise a prescribed present value of revenues
to do so by imposing a distorting flat rate tax on the output of a competitive representative
firm
The firm faces costs of adjustment and lives within a competitive equilibrium, which in turn imposes restrictions on the tax authority 13
References The presentation below is based on a recent paper by Evans and Sargent [ES13]
Regarding techniques, we will make use of the methods described in
1. the linear regulator lecture
2. the upcoming lecture on solving linear quadratic Stackelberg models
Two Sources of History Dependence
506
Sequence of Governments Timing Protocol For the second timing protocol we use the notion
of a sustainable plan proposed in [CK90], also referred to as a credible public policy in [Sto89]
A key idea here is that history-dependent policies can be arranged so that, when regarded as a
representative firms forecasting functions, they confront policy makers with incentives to confirm
them
We follow Chang [Cha98] in expressing such history-dependent plans recursively
Credibility considerations contribute an additional auxiliary state variable in the form of a
promised value to the planner
It expresses how decisions must unfold to give the government the incentive to confirm private
sector expectations when the government chooses sequentially
Note: We occasionally hear confusion about the consequences of recursive representations of
government policies under our two timing protocols. It is incorrect to regard a recursive representation of the Ramsey plan as in any way solving a time-inconsistency problem. On the contrary,
the evolution of the auxiliary state variable that augments the authentic ones under our first timing protocol ought to be viewed as expressing the time-inconsistency of a Ramsey plan. Despite
that, in literatures about practical monetary policy one sometimes hears interpretations that sell
Ramsey plans in settings where our sequential timing protocol is the one that more accurately
characterizes decision making. Please beware of discussions that toss around claims about credibility if you dont also see recursive representations of policies with the complete list of state
variables appearing in our [Cha98] -like analysis that we present below.
Competitive equilibrium
A0 > 0, A1 > 0
(3.177)
t =0
d
pt qt (qt+1 qt )2 t qt
2
(3.178)
t = 0, 1, . . .
(3.179)
January 28, 2015
507
(3.180)
Q t +1 = Q t + u t
(3.181)
Below, we shall
Study history-dependent tax policies that either solve a Ramsey plan or are credible
Describe recursive representations of both types of history-dependent policies
Ramsey Problem
The planners objective is cast in terms of consumer surplus net of the firms adjustment costs
14 It is important not to set q = Q prematurely. To make the firm a price taker, this equality should be imposed after
t
t
and not before solving the firms optimization problem.
15 We could instead, perhaps with more accuracy, define a promised marginal value as ( A A Q
0
1 t+1 ) t+1 +
ut+1 /, since this is the object to which the firms first-order condition instructs it to equate to the marginal cost dut
of ut = qt+1 qt . This choice would align better with how Chang [Cha98] chose to express his competitive equilibrium recursively. But given (ut , Qt ), the representative firm knows ( Qt+1 , t+1 ), so it is adequate to take ut+1 as the
intermediate variable that summarizes how ~t+1 affects the firms choice of ut .
508
Consumer surplus is
Z Q
0
( A0 A1 x )dx = A0 Q
A1 2
Q
2
(3.182)
t t Qt = G0
(3.183)
t =1
~ ~u)
The Ramsey problem is to choose a tax sequence ~ and a competitive equilibrium outcome ( Q,
that maximize
A1 2 d 2
t
(3.184)
A0 Q t 2 Q t 2 u t
t =0
subject to (3.183)
Thus, the Ramsey timing protocol is:
1. At time 0, knowing ( Q0 , G0 ), the Ramsey planner chooses {t+1 }
t =0
2. Given Q0 , {t+1 }
t=0 , a competitive equilibrium outcome { ut , Qt+1 }t=0 emerges
Note: In bringing out the timing protocol associated with a Ramsey plan, we run head on into
a set of issues analyzed by Bassetto [Bas05]. This is because our definition of the Ramsey timing
protocol doesnt completely describe all conceivable actions by the government and firms as time
unfolds. For example, the definition is silent about how the government would respond if firms,
for some unspecified reason, were to choose to deviate from the competitive equilibrium associated with the Ramsey plan, possibly prompting violation of government budget balance. This is
an example of the issues raised by [Bas05], who identifies a class of government policy problems
whose proper formulation requires supplying a complete and coherent description of all actors
behavior across all possible histories. Implicitly, we are assuming that a more complete description
of a government strategy could be specified that (a) agrees with ours along the Ramsey outcome,
and (b) suffices uniquely to implement the Ramsey plan by deterring firms from taking actions
that deviate from the Ramsey outcome path.
To formulate this problem as a Lagrangian, attach a Lagrange multiplier to the budget constraint
(3.183)
A
d
1
(3.185)
t ( A0 Qt 2 Q2t 2 u2t ) + t t Qt G0 0 Q0
t =0
t =0
509
Implementability Multiplier Approach The Ramsey problem is a special case of the linear
quadratic dynamic Stackelberg problem analyzed in the Stackelberg lecture
The idea is to construct a recursive representation of a Ramsey plan by including among the state
variables Lagrange multipliers on implementability constraints
These multipliers require the Ramsey planner to choose among competitive equilibrium allocations
Their motions through time become components of a recursive representation of a historydependent plan for taxes
For us, the key implementability conditions are (3.180) for t 0
Holding fixed and G0 , the Lagrangian for the planning problem can be abbreviated as
A1 2 d 2
t
max A0 Qt
Q ut + t Qt
2 t
2
{ut ,t+1 } t=0
Define
1
zt := Qt
t
and
1
Qt
zt
yt :=
=
t
ut
ut
Here the elements of zt are genuine state variables and ut is a jump variable.
We include t as a state variable for bookkeeping purposes: it helps to map the problem into a
linear regulator problem with no cross products between states and controls
However, it will be a redundant state variable in the sense that the optimal tax t+1 will not depend
on t
The government chooses t+1 at time t as a function of the time t state
Thus, we can rewrite the Ramsey problem as
{yt ,t+1 }
(3.186)
t =0
0
A0
R= 2
0
0
A20
A1
2
0
0
0
,
0
d
2
1
0
A=
0
Ad0
0
1
0
A1
d
(3.187)
0
0
0
0
0
1
0 ,
A1
1
d +
0
0
B=
1
1
d
Because this problem falls within the framework, we can proceed as follows
Letting t be a vector of Lagrangian multipliers on the transition laws summarized in (3.187), it
follows that
t = Pyt , where P solves the Riccati equation P = R + A0 PA A0 PB( B0 PB)1 B0 PA
T HOMAS S ARGENT AND J OHN S TACHURSKI
510
zt
P
P
zt
= 11 12
ut
P21 P22 ut
1
1
ut = P22
P21 zt + P22
ut
Now the multiplier ut becomes our authentic state variable one that measures the cost to the
government of confirming the representative firms prior expectations about time t government
actions
The complete state at time t becomes zt ut , and
yt =
I
0
zt
zt
=
1
1
ut
ut
P22
P21 P22
so
t+1 = F
I
1
P22
P21
0
1
P22
zt
ut
1
Q0
=
0
0
z0
u0
(3.188)
Equation (3.188) incorporates the finding that the Ramsey planner finds it optimal to set u0 to
zero
Kydland-Prescott Approach Kydland and Prescott [KP80a] or Chang [Cha98] would formulate
our Ramsey problem in terms of the Bellman equation
A1 2 d 2
v( Qt , t , ut ) = max A0 Qt
Qt ut + t Qt + v( Qt+1 , t+1 , ut+1 )
t+1
2
2
where the maximization is subject to the constraints
Q t +1 = Q t + u t
and
u t +1 =
A0
A
A
1
1
+ 1 Qt + 1 + ut + t+1
d
d
d
511
1
Qt
z
yt = = t ,
ut
t
ut
0
where zt = 1 Qt t are authentic state variables and ut is a variable whose time 0 value is a
jump variable but whose values for dates t 1 will become state variables that encode history
dependence in the Ramsey plan
Write a dynamic programming problem in the style of [KP80a] as
v(yt ) = max y0t Ryt + v(yt+1 )
(3.189)
t+1
0
A0
2
R=
0
0
A20
A1
2
0
0
1
0
0
0
, A =
0
0
d
Ad0
2
0
1
0
A1
d
0
0
0
0
0
0
0
1
0 , and B = 1 .
A1
1
1
d +
d
v
u0
=0
If we partition P as
P
P12
P = 11
P21 P22
then we have
0=
0
z00 P11 z0 + z00 P12 u0 + u00 P21 z0 + u00 P22 u0 = P12
z0 + P21 u0 + 2P22 u0
u0
which implies
1
u0 = P22
P21 z0
(3.190)
with initial state z0
z
t+1 = F t
ut
0
1
P22
P21 z0
and
z t +1
z
= ( A BF ) t
u t +1
ut
512
Comparison We can compare the outcome from the Kydland-Prescott approach to the outcome
of the Lagrangian multiplier on the implementability constraint approach of the preceding section
Using the formula
I
0
zt
zt
=
1
1
ut
ut
P22 P21 P22
and applying it to the evolution of the state
I
0
z t +1
I
0
zt
=
( A BF )
1
1
ut+1
P21 P22
ut
P22
P21 P22
we get
z t +1
z
= ( A BF ) t
u t +1
ut
(3.191)
or yt+1 = A F yt where A F := A BF
Then using the initial state value u,0 = 0, we obtain
z0
z0
=
1
u0
P22
P21 z0
(3.192)
(3.194)
Q t +1 = Q t + u t
(3.195)
u t +1 = u ( Q t , u t | )
(3.196)
513
import numpy as np
from quantecon import LQ
from quantecon.matrix_eqn import solve_discrete_lyapunov
from scipy.optimize import root
def computeG(A0, A1, d, Q0, tau0, beta, mu):
"""
Compute government income given mu and return tax revenues and
policy matrixes for the planner.
Parameters
---------A0 : float
A constant parameter
A1 : float
A constant parameter
d : float
A constant parameter
Q0 : float
An initial condition
tau0 : float
An initial condition
beta : float
A constant parameter
mu : float
Lagrange multiplier
Returns
------T0 : array(float)
Present discounted value of government spending
A : array(float)
One of the transition matrices for the states
B : array(float)
Another transition matrix for the states
F : array(float)
Policy rule matrix
P : array(float)
Value function matrix
"""
# Create Matrices for solving Ramsey problem
R = np.array([[0, -A0/2, 0, 0],
[-A0/2, A1/2, -mu/2, 0],
[0, -mu/2, 0, 0],
[0, 0, 0, d/2]])
A = np.array([[1, 0, 0, 0],
[0, 1, 0, 1],
[0, 0, 0, 0],
[-A0/d, A1/d, 0, A1/d+1/beta]])
B = np.array([0, 0, 1, 1/d]).reshape(-1, 1)
514
Q = 0
# Use LQ to solve the Ramsey Problem.
lq = LQ(Q, -R, A, B, beta=beta)
P, F, d = lq.stationary_values()
# Need y_0 to compute government tax revenue.
P21 = P[3, :3]
P22 = P[3, 3]
z0 = np.array([1, Q0, tau0]).reshape(-1, 1)
u0 = -P22**(-1) * P21.dot(z0)
y0 = np.vstack([z0, u0])
# Define A_F and S matricies
AF = A - B.dot(F)
S = np.array([0, 1, 0, 0]).reshape(-1, 1).dot(np.array([[0, 0, 1, 0]]))
# Solves equation (25)
temp = beta * AF.T.dot(S).dot(AF)
Omega = solve_discrete_lyapunov(np.sqrt(beta) * AF.T, temp)
T0 = y0.T.dot(Omega).dot(y0)
return T0, A, B, F, P
# ==
T
A0
A1
d
beta
Primitives == #
= 20
= 100.0
= 0.05
= 0.20
= 0.95
# ==
mu0
Q0
tau0
Initial conditions == #
= 0.0025
= 1000.0
= 0.0
def gg(mu):
"""
Computes the tax revenues for the government given Lagrangian
multiplier mu.
"""
return computeG(A0, A1, d, Q0, tau0, beta, mu)
# == Solve the Ramsey problem and associated government revenue == #
G0, A, B, F, P = gg(mu0)
# == Compute the optimal u0 == #
P21 = P[3, :3]
P22 = P[3, 3]
z0 = np.array([1, Q0, tau0]).reshape(-1, 1)
u0 = -P22**(-1) * P21.dot(z0)
515
# == Initialize vectors == #
y = np.zeros((4, T))
uhat
= np.zeros(T)
uhatdif
= np.zeros(T)
tauhat
= np.zeros(T)
tauhatdif = np.zeros(T-1)
mu
= np.zeros(T)
G
= np.zeros(T)
GPay
= np.zeros(T)
# == Initial conditions == #
G[0] = G0
mu[0] = mu0
uhatdif[0] = 0
uhat[0] = u0
y[:, 0] = np.vstack([z0, u0]).flatten()
for t in range(1, T):
# Iterate government policy
y[:, t] = (A-B.dot(F)).dot(y[:, t-1])
# update G
G[t] = (G[t-1] - beta*y[1, t]*y[2, t])/beta
GPay[t] = beta*y[1, t]*y[2, t]
# Compute the mu if the government were able to reset its plan
# ff is the tax revenues the government would receive if they reset the
# plan with Lagrange multiplier mu minus current G
ff = lambda mu: (gg(mu)[0]-G[t]).flatten()
# find ff = 0
mu[t] = root(ff, mu[t-1]).x
temp, Atemp, Btemp, Ftemp, Ptemp = gg(mu[t])
# Compute
P21temp =
P22temp =
uhat[t] =
alternative decisions
Ptemp[3, :3]
P[3, 3]
-P22temp**(-1)*P21temp.dot(y[:3, t])
516
come for ( Qt , ut )
From top to bottom, the panels show Qt , t and ut := Qt+1 Qt over t = 0, . . . , 15
The optimal decision rule is 16
t+1 = 248.0624 0.1242Qt 0.3347ut
(3.197)
Notice how the Ramsey plan calls for a high tax at t = 1 followed by a perpetual stream of lower
taxes
16
As promised, t does not appear in the Ramsey planners decision rule for t+1 .
517
Taxing heavily at first, less later expresses time-inconsistency of the optimal plan for {t+1 }
t =0
Well characterize this formally after first discussing how to compute .
0
0
Computing Define the selector vectors e = 0 0 1 0 and eQ = 0 1 0 0 and express
t = e0 yt and Qt = e0Q yt
Evidently Qt t = y0t eQ e0 yt = y0t Syt where S := eQ e0
We want to compute
T0 =
t t Qt = 1 Q1 + T1
t =1
where T1 =
t =2
t 1 Q
t t
(3.198)
Equation (3.198) is a discrete Lyapunov equation that can be solved for using QuantEcons
solve_discrete_lyapunov function
The matrix F and therefore the matrix A F = A BF depend on
To find a that guarantees that T0 = G0 we proceed as follows:
1. Guess an initial , compute a tentative Ramsey plan and the implied T0 = y00 ()y0
2. If T0 > G0 , lower ; if T0 < , raise
3. Continue iterating on step 3 until T0 = G0
Time Inconsistency
t =0
A1 2 d 2
A0 Q t
Q ut
2 t
2
w ( Q0 , u0 | 0 ) =
t =0
A1 2 d 2
Q ut
A0 Q t
2 t
2
(3.199)
where
T HOMAS S ARGENT AND J OHN S TACHURSKI
518
{ Qt , ut }
t=0 are evaluated under the Ramsey plan whose recursive representation is given by
(3.194), (3.195), (3.196)
0 is the value of the Lagrange multiplier that assures budget balance, computed as described above
Evidently, these continuation values satisfy the recursion
w ( Q t , u t | 0 ) = A0 Q t
A1 2 d 2
Q ut + w( Qt+1 , ut+1 |0 )
2 t
2
(3.200)
s s Qs )
(3.201)
s =1
17
where {t , Qt }
t=0 is the original Ramsey outcome
Then at time t 1,
1. take ( Qt , Gt ) inherited from the original Ramsey plan as initial conditions
2. invite a brand new Ramsey planner to compute a new Ramsey plan, solving for a new ut , to
be called u t , and for a new , to be called t
The revised Lagrange multiplier t is chosen so that, under the new Ramsey plan, the government
is able to raise enough continuation revenues Gt given by (3.201)
Would this new Ramsey plan be a continuation of the original plan?
The answer is no because along a Ramsey plan, for t 1, in general it is true that
w Qt , ( Qt | )| > w( Qt , ut |0 )
(3.202)
Inequality (3.202) expresses a continuation Ramsey planners incentive to deviate from a time 0
Ramsey plan by
1. resetting ut according to (3.193)
2. adjusting the Lagrange multiplier on the continuation appropriately to account for tax revenues already collected 18
Inequality (3.202) expresses the time-inconsistency of a Ramsey plan
17
The continuation revenues Gt are the time t present value of revenues that must be raised to satisfy the original
time 0 government intertemporal budget constraint, taking into account the revenues already raised from s = 1, . . . , t
under the original Ramsey plan.
18 For example, let the Ramsey plan yield time 1 revenues Q . Then at time 1, a continuation Ramsey planner
1 1
519
A Simulation To bring out the time inconsistency of the Ramsey plan, we compare
the time t values of t+1 under the original Ramsey plan with
the value t+1 associated with a new Ramsey plan begun at time t with initial conditions
( Qt , Gt ) generated by following the original Ramsey plan
Here again Gt := t ( G0 ts=1 s s Qs )
G Q1 1
would want to raise continuation revenues, expressed in units of time 1 goods, of G 1 :=
. To finance the
remainder revenues, the continuation Ramsey planner would find a continuation Lagrange multiplier by applying
the three-step procedure from the previous section to revenue requirements G 1 .
The difference t
:=
520
In the second panel we compare the time t outcome for ut under the original Ramsey plan with
the time t value of this new Ramsey problem starting from ( Qt , Gt )
T HOMAS S ARGENT AND J OHN S TACHURSKI
521
To compute ut under the new Ramsey plan, we use the following version of formula (3.190):
1
ut = P22
( t ) P21 ( t )zt
Here zt is evaluated along the Ramsey outcome path, where we have included t to emphasize
the dependence of P on the Lagrange multiplier 0 19
To compute ut along the Ramsey path, we just iterate the recursion starting (3.191) from the initial
Q0 with u0 being given by formula (3.190)
Thus the second panel indicates how far the reinitialized value ut value departs from the time t
outcome along the Ramsey plan
Note that the restarted plan raises the time t + 1 tax and consequently lowers the time t value of
ut
Associated with the new Ramsey plan at t is a value of the Lagrange multiplier on the continuation
government budget constraint
This is the third panel of the figure
The fourth panel plots the required continuation revenues Gt implied by the original Ramsey plan
These figures help us understand the time inconsistency of the Ramsey plan
Further Intuition One feature to note is the large difference between t+1 and t+1 in the top
panel of the figure
If the government is able to reset to a new Ramsey plan at time t, it chooses a significantly higher
tax rate than if it were required to maintain the original Ramsey plan
The intuition here is that the government is required to finance a given present value of expenditures with distorting taxes
The quadratic adjustment costs prevent firms from reacting strongly to variations in the tax rate
for next period, which tilts a time t Ramsey planner toward using time t + 1 taxes
As was noted before, this is evident in the first figure, where the government taxes the next period
heavily and then falls back to a constant tax from then on
This can also been seen in the third panel of the second figure, where the government pays off a
significant portion of the debt using the first period tax rate
The similarities between the graphs in the last two panels of the second figure reveals that there is a
one-to-one mapping between G and
The Ramsey plan can then only be time consistent if Gt remains constant over time, which will not
be true in general
Credible Policy We express the theme of this section in the following: In general, a continuation
of a Ramsey plan is not a Ramsey plan
This is sometimes summarized by saying that a Ramsey plan is not credible
19
It can be verified that this formula puts non-zero weight only on the components 1 and Qt of zt .
522
A1 2 d 2
Q ut + Jt+1 (t+1 , Gt+1 )
2 t
2
(3.203)
t0
(3.204)
523
to regard the time t government as choosing (t+1 , Gt+1 ) subject to constraint (3.204)
To express the notion of a credible government plan concisely, we expand the strategy space by
also adding Jt itself as a state variable and allowing policies to take the following recursive forms
20
Regard J0 as an a discounted present value promised to the Ramsey planner and take it as an
initial condition.
Then after choosing u0 according to
u0 = ( Q0 , G0 , J0 ),
(3.205)
choose subsequent taxes, outputs, and continuation values according to recursions that can be
represented as
t+1 = ( Qt , ut , Gt , Jt )
(3.206)
ut+1 = ( Qt , ut , Gt , Jt , t+1 )
(3.207)
(3.208)
(3.209)
Here
t+1 is the time t + 1 government action called for by the plan, while
t+1 is possibly some one-time deviation that the time t + 1 government contemplates and
Gt+1 is the associated continuation tax collections
The plan is said to be credible if, for each t and each state ( Qt , ut , Gt , Jt ), the plan satisfies the
incentive constraint
A1 2
Q
2 t
A
A0 Qt 1 Q2t
2
Jt = A0 Qt
d 2
u + Jt+1 (t+1 , G t+1 )
2 t
d 2
u + Jt+1 (t+1 , Gt+1 )
2 t
(3.210)
(3.211)
Gt t+1 Qt+1
Inequality expresses that continuation values adjust to deviations in ways that discourage
the government from deviating from the prescribed t+1
Inequality (3.210) indicates that two continuation values Jt+1 contribute to sustaining time t
promised value Jt
Jt+1 (t+1 , G t+1 ) is the continuation value when the government chooses to confirm the
private sectors expectation, formed according to the decision rule (3.206) 21
20
This choice is the key to what [LS12] call dynamic programming squared.
Note the double role played by (3.206): as decision rule for the government and as the private sectors rule for
forecasting government actions.
21
524
Jt+1 (t+1 , Gt+1 ) tells the continuation consequences should the government disappoint
the private sectors expectations
The internal structure of a credible plan deters deviations from it
That (3.210) maps two continuation values Jt+1 (t+1 , Gt+1 ) and Jt+1 (t+1 , G t+1 ) into one promised
value Jt reflects how a credible plan arranges a system of private sector expectations that induces
the government to choose to confirm them
Chang [Cha98] builds on how inequality (3.210) maps two continuation values into one
Remark Let J be the set of values associated with credible plans
Every value J J can be attained by a credible plan that has a recursive representation of form
form (3.206), (3.207), (3.208)
The set of values can be computed as the largest fixed point of an operator that maps sets of
candidate values into sets of values
Given a value within this set, it is possible to construct a government strategy of the recursive
form (3.206), (3.207), (3.208) that attains that value
In many cases, there is set a of values and associated credible plans
In those cases where the Ramsey outcome is credible, a multiplicity of credible plans is a key part
of the story because, as we have seen earlier, a continuation of a Ramsey plan is not a Ramsey plan
For it to be credible, a Ramsey outcome must be supported by a worse outcome associated with
another plan, the prospect of reversion to which sustains the Ramsey outcome
Concluding remarks
The term optimal policy, which pervades an important applied monetary economics literature,
means different things under different timing protocols
Under the static Ramsey timing protocol (i.e., choose a sequence once-and-for-all), we obtain a
unique plan
Here the phrase optimal policy seems to fit well, since the Ramsey planner optimally reaps early
benefits from influencing the private sectors beliefs about the governments later actions
When we adopt the sequential timing protocol associated with credible public policies, optimal
policy is a more ambiguous description
There is a multiplicity of credible plans
True, the theory explains how it is optimal for the government to confirm the private sectors
expectations about its actions along a credible plan
But some credible plans have very bad outcomes
These bad outcomes are central to the theory because it is the presence of bad credible plans that
makes possible better ones by sustaining the low continuation values that appear in the second
line of incentive constraint (3.210)
525
Recently, many have taken for granted that optimal policy means follow the Ramsey plan 22
In pursuit of more attractive ways to describe a Ramsey plan when policy making is in practice
done sequentially, some writers have repackaged a Ramsey plan in the following way
Take a Ramsey outcome - a sequence of endogenous variables under a Ramsey plan - and
reinterpret it (or perhaps only a subset of its variables) as a target path of relationships among
outcome variables to be assigned to a sequence of policy makers 23
If appropriate (infinite dimensional) invertibility conditions are satisfied, it can happen that
following the Ramsey plan is the only way to hit the target path 24
The spirit of this work is to say, in a democracy we are obliged to live with the sequential
timing protocol, so lets constrain policy makers objectives in ways that will force them to
follow a Ramsey plan in spite of their benevolence 25
By this slight of hand, we acquire a theory of an optimal outcome target path
This invertibility argument leaves open two important loose ends:
1. implementation, and
2. time consistency
As for (1), repackaging a Ramsey plan (or the tail of a Ramsey plan) as a target outcome sequence
does not confront the delicate issue of how that target path is to be implemented 26
As for (2), it is an interesting question whether the invertibility logic can repackage and conceal
a Ramsey plan well enough to make policy makers forget or ignore the benevolent intentions that
give rise to the time inconsistency of a Ramsey plan in the first place
To attain such an optimal output path, policy makers must forget their benevolent intentions because there will inevitably occur temptations to deviate from that target path, and the implied
relationship among variables like inflation, output, and interest rates along it
Remark The continuation of such an optimal target path is not an optimal target path
22
It is possible to read [Woo03] and [GW10] as making some carefully qualified statements of this type. Some of the
qualifications can be interpreted as advice eventually to follow a tail of a Ramsey plan.
23 In our model, the Ramsey outcome would be a path (~
~ ).
p, Q
24 See [GW10].
25 Sometimes the analysis is framed in terms of following the Ramsey plan only from some future date T onwards.
26 See [Bas05] and [ACK10].
526
CHAPTER
FOUR
SOLUTIONS
Each lecture with exercises has a link to solutions immediately after the exercises
The links are to static versions of IPython Notebook files the directory of the originals is here
If you look at a typical solution notebook youll see a download icon on top right
You can download a copy of the ipynb file (the notebook file) using that icon
Now start IPython notebook and navigate to the downloaded ipynb file
Once you open it in IPython notebook it should be running live, allowing you to make changes
527
528
CHAPTER
FIVE
5.1 FAQs
5.2 How do I install Python?
See this lecture
5.5 Where do I get all the Python programs from the lectures?
To import the quantecon library this discussion
To get all the code at once, visit our public code repository
https://github.com/QuantEcon/QuantEcon.py
See this lecture for more details on how to download the programs
529
530
531
PDF Lectures
532
REFERENCES
[Aiy94]
S Rao Aiyagari. Uninsured Idiosyncratic Risk and Aggregate Saving. The Quarterly
Journal of Economics, 109(3):659684, 1994.
[AM05]
[BD86]
David Backus and John Driffill. The consistency of optimal policy in stochastic rational
expectations models. Technical Report, CEPR Discussion Papers, 1986.
[Bar79]
Robert J Barro. On the Determination of the Public Debt. Journal of Political Economy,
87(5):940971, 1979.
[Bas05]
[BS79]
[Bis06]
[Cal78]
[Car01]
Christopher D Carroll. A Theory of the Consumption Function, with and without Liquidity Constraints. Journal of Economic Perspectives, 15(3):2345, 2001.
[Cha98]
Roberto Chang. Credible monetary policy in an infinite horizon model: recursive approaches. Journal of Economic Theory, 81(2):431461, 1998.
[CK90]
Varadarajan V Chari and Patrick J Kehoe. Sustainable plans. Journal of Political Economy,
pages 783802, 1990.
[Col90]
Wilbur John Coleman. Solving the Stochastic Growth Model by Policy-Function Iteration. Journal of Business & Economic Statistics, 8(1):2729, 1990.
[CC08]
J. D. Cryer and K-S. Chan. Time Series Analysis. Springer, 2nd edition edition, 2008.
533
534
REFERENCES
[Dea91]
[DP94]
Angus Deaton and Christina Paxson. Intertemporal Choice and Inequality. Journal of
Political Economy, 102(3):437467, 1994.
[DH10]
Wouter J Den Haan. Comparison of solutions to the incomplete markets model with
aggregate uncertainty. Journal of Economic Dynamics and Control, 34(1):427, 2010.
[DS10]
Ulrich Doraszelski and Mark Satterthwaite. Computable markov-perfect industry dynamics. The RAND Journal of Economics, 41(2):215243, 2010.
[DLP13]
[Dud02]
R M Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2002.
[EG87]
Robert F Engle and Clive W J Granger. Co-integration and Error Correction: Representation, Estimation, and Testing. Econometrica, 55(2):251276, 1987.
[EP95]
Richard Ericson and Ariel Pakes. Markov-perfect industry dynamics: a framework for
empirical work. The Review of Economic Studies, 62(1):5382, 1995.
[ES13]
David Evans and Thomas J Sargent. History dependent public policies. Oxford University
Press, 2013.
[EH01]
[Fri56]
[GW10]
Marc P Giannoni and Michael Woodford. Optimal target criteria for stabilization policy.
Technical Report, National Bureau of Economic Research, 2010.
[Hal78]
[HM82]
Robert E Hall and Frederic S Mishkin. The Sensitivity of Consumption to Transitory Income: Estimates from Panel Data on Households. National Bureau of Economic Research
Working Paper Series, 1982.
[Ham05]
James D Hamilton. Whats real about the business cycle?. Federal Reserve Bank of St.
Louis Review, pages 435452, 2005.
[HR85]
Dennis Epple, Hansen, Lars. P. and Will Roberds. Linear-quadratic duopoly models of
resource depletion. In Energy, Foresight, and Strategy. Resources for the Future, vol 1
edition, 1985.
[HS08]
[HS13]
L P Hansen and T J Sargent. Recursive Models of Dynamic Linear Economies. The Gorman
Lectures in Economics. Princeton University Press, 2013.
[HS00]
REFERENCES
535
[HLL96]
O Hernandez-Lerma and J B Lasserre. Discrete-Time Markov Control Processes: Basic Optimality Criteria. number Vol 1 in Applications of Mathematics Stochastic Modelling
and Applied Probability. Springer, 1996.
[HP92]
[HR93]
Hugo A Hopenhayn and Richard Rogerson. Job Turnover and Policy Evaluation: A
General Equilibrium Analysis. Journal of Political Economy, 101(5):915938, 1993.
[Hug93]
[Janich94] K Jnich. Linear Algebra. Springer Undergraduate Texts in Mathematics and Technology. Springer, 1994.
[Kam12]
Takashi Kamihigashi. Elementary results on solutions to the bellman equation of dynamic programming: existence, uniqueness, and convergence. Technical Report, Kobe
University, 2012.
[Kuh13]
Moritz Kuhn. Recursive Equilibria In An Aiyagari-Style Economy With Permanent Income Shocks. International Economic Review, 54:807835, 2013.
[KP80a]
Finn E Kydland and Edward C Prescott. Dynamic optimal taxation, rational expectations and optimal control. Journal of Economic Dynamics and Control, 2:7991, 1980.
[KP77]
Finn E., Kydland and Edward C. Prescott. Rules rather than discretion: the inconsistency of optimal plans. Journal of Political Economy, 106(5):867896, 1977.
[KP80b]
Finn E., Kydland and Edward C. Prescott. Time to build and aggregate fluctuations.
Econometrics, 50(6):13452370, 1980.
[LM94]
A Lasota and M C MacKey. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics.
Applied Mathematical Sciences. Springer-Verlag, 1994.
[LM80]
David Levhari and Leonard J Mirman. The great fish war: an example using a dynamic
cournot-nash solution. The Bell Journal of Economics, pages 322334, 1980.
[LS12]
[Luc78]
Robert E Lucas, Jr. Asset prices in an exchange economy. Econometrica: Journal of the
Econometric Society, 46(6):14291445, 1978.
[LP71]
[LS83]
Robert E Lucas, Jr and Nancy L Stokey. Optimal Fiscal and Monetary Policy in an
Economy without Capital. Journal of monetary Economics, 12(3):5593, 1983.
[MS89]
Albert Marcet and Thomas J Sargent. Convergence of Least-Squares Learning in Environments with Hidden State Variables and Private Information. Journal of Political
Economy, 97(6):13061322, 1989.
[MdRV10] V Filipe Martins-da-Rocha and Yiannis Vailakis. Existence and Uniqueness of a Fixed
Point for Local Contractions. Econometrica, 78(3):11271141, 2010.
T HOMAS S ARGENT AND J OHN S TACHURSKI
REFERENCES
536
J J McCall. Economics of Information and Job Search. The Quarterly Journal of Economics,
84(1):113126, 1970.
[MP85]
Rajnish Mehra and Edward C Prescott. The equity premium: A puzzle. Journal of Monetary Economics, 15(2):145161, 1985.
[MT09]
S P Meyn and R L Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press, 2009.
[MS85a]
Marcus Miller and Mark Salmon. Dynamic games and the time inconsistency of optimal policy in open economies. The Economic Journal, pages 124137, 1985.
[MS85b]
Marcus Miller and Mark Salmon. Dynamic Games and the Time Inconsistency of Optimal Policy in Open Economies. Economic Journal, 95:124137, 1985.
[MB54]
[Nea99]
Derek Neal. The Complexity of Job Mobility among Young Men. Journal of Labor Economics, 17(2):237261, 1999.
[Par99]
[PL92]
D.A. Currie, Pearlman, J.G. and P.L. Levine. Rational expectations with partial information. Economic Modeling, 3:90105, 1992.
[Pea92]
[PCL86]
Joseph Pearlman, David Currie, and Paul Levine. Rational expectations models with
partial information. Economic Modelling, 3(2):90105, 1986.
[Pre77]
Edward C. Prescott. Should control theory be used for economic stabilization? Journal
of Monetary Economics, 7:1338, 1977.
[Rab02]
Guillaume Rabault. When do borrowing constraints bind? Some new results on the
income fluctuation problem. Journal of Economic Dynamics and Control, 26(2):217245,
2002.
[Ram27]
[Rei09]
[Rya12]
[Sar79]
[Sar87]
REFERENCES
537
[SE77]
Jack Schechtman and Vera L S Escudero. Some results on an income fluctuation problem. Journal of Economic Theory, 16(2):151166, 1977.
[Sch69]
[Shi95]
A N Shiriaev. Probability. Graduate texts in mathematics. Springer. Springer, 2nd edition, 1995.
[SLP89]
N L Stokey, R E Lucas, and E C Prescott. Recursive Methods in Economic Dynamics. Harvard University Press, 1989.
[Sto89]
Nancy L Stokey. Reputation and time consistency. The American Economic Review, pages
134139, 1989.
[STY04]
Kjetil Storesletten, Christopher I Telmer, and Amir Yaron. Consumption and risk sharing over the life cycle. Journal of Monetary Economics, 51(3):609633, 2004.
[Sun96]
[Tau86]
[Tow83]
[VL11]
Ngo Van Long. Dynamic games in the economics of natural resources: a survey. Dynamic Games and Applications, 1(1):115148, 2011.
[Woo03]
Michael Woodford. Interest and Prices: Foundations of a Theory of Monetary Policy. Princeton University Press, 2003.
REFERENCES
538
Acknowledgements: These lectures have benefitted greatly from comments and suggestion from
our colleagues, students and friends. Special thanks go to Anmol Bhandari, Jeong-Hun Choi,
Chase Coleman, David Evans, Chenghan Hou, Doc-Jin Jang, Spencer Lyon, Qingyin Ma, Matthew
McKay, Tomohito Okabe, Alex Olssen, Nathan Palmer and Yixiao Zhou.
INDEX
AR, 462
ARMA, 456, 459, 462
B
Bellman Equation, 416
Bisection, 121
C
Central Limit Theorem, 212, 219
Intuition, 219
Multivariate Case, 223
cloud computing, 15, 171
amazon ec2, 15
google app engine, 15
Python, 159
pythonanywhere, 15
sagemath cloud, 15
wakari, 15
CLT, 212
Complex Numbers, 460
Continuous State Markov Chains, 349
Covariance Stationary, 457
Covariance Stationary Processes, 456
AR, 458
MA, 458
Cython, 160, 168
D
Data Sources, 147
World Bank, 150
Dynamic Programming, 262, 264
Computation, 266
Shortest Paths, 204
Theory, 264
Unbounded Utility, 265
Value Function Iteration, 265, 266
Dynamic Typing, 161
F
Finite Markov Asset Pricing, 327, 330
Lucas Tree, 330
Finite Markov Chains, 190, 191
Stochastic Matrices, 191
Fixed Point Theory, 368
G
General Linear Processes, 458
Git, 33
H
History Dependent Public Policies, 504
Competitive Equilibrium, 506
Ramsey Timing, 505
Sequence of Governments Timing, 506
Timing Protocols, 505
I
Immutable, 87
Implementability Multiplier Approach, 509
Infinite Horizon Dynamic Programming, 261,
262
Integration, 117, 124
IPython, 18, 31, 152
Debugging, 155
Magics, 153
Reloading Modules, 154
Shell, 31
Timing Code, 153
IPython Notebook, 16, 18, 31
Basics, 20
Figures, 23
539
540
INDEX
Computation, 369
Consumers, 366
Dynamic Program, 366
Equilibrium Constraints, 367
Equilibrium Price Funtion, 367
Pricing, 366
Solving, 368
Help, 24
nbviewer, 27
Setup, 18
Sharing, 27
K
Kalman Filter, 248
Programming Implementation, 254
Recursive Procedure, 253
Kydland-Prescott Approach, 510
541
INDEX
N
nbviewer, 27
NetworkX, 14
Neumanns Theorem, 188
Newton-Raphson Method, 122
Nonparametric Estimation, 478
Numba, 160, 165
NumPy, 104, 105, 117
Arrays, 105
Arrays (Creating), 107
Arrays (Indexing), 108
Arrays (Methods), 110
Arrays (Operations), 111
Arrays (Shape and Dimension), 106
Comparisons, 113
Matrix Multiplication, 112
Universal Functions, 164
Vectorized Functions, 114
O
Object Oriented Programming, 63
Classes, 66
Key Concepts, 64
Methods, 67
Special Methods, 70
Terminology, 64
On-the-Job Search, 384
Model, 385
Model Features, 385
Parameterization, 385
Programming Implementation, 386
Solving for Policies, 391
Optimal Growth
Model, 262
Policy Function, 271
Policy Funtion Approach, 263
Optimal Savings, 403
Computation, 405
Problem, 404
Programming Implementation, 407
Optimal Taxation, 487
Optimization, 117, 124
T HOMAS S ARGENT AND J OHN S TACHURSKI
Multivariate, 124
P
Pandas, 13, 137, 138
Accessing Data, 149
DataFrames, 140
Series, 139
parallel computing, 15, 171
copperhead, 15
ipython, 15
pycuda, 15
starcluster, 15
Periodograms, 474
Computation, 476
Interpretation, 475
Permanent Income Model, 334
Halls Representation, 341
Savings Problem, 335
Positive Definite Matrices, 188
Pricing Models, 327
Finite Markov Asset Pricing, 330
Risk Aversion, 328
Risk Neutral, 327
Programming
Dangers, 271
Iteration, 276
Writing Reusable Code, 271
pyMC, 14
pystan, 14
Python, 16
Anaconda, 17
Assertions, 90
Cloud Computing, 159
common uses, 8
Comparison, 57
Conditions, 42
Content, 74
Cython, 168
Data Types, 49
Decorators, 93, 95, 97
Descriptors, 93, 96
Dictionaries, 52
Docstrings, 59
Exceptions, 90
For loop, 39
Functions, 41, 58
Generator Functions, 99
Generators, 98
January 28, 2015
542
INDEX
Handling Errors, 89
Identity, 75
import, 53
Indentation, 40
Interfacing with Fortran, 171
Interpreter, 83
Introductory Example, 34
IO, 53
IPython, 17, 18
Iterables, 79
Iteration, 55, 77
Iterators, 77, 78, 80
keyword arguments, 61
lambda functions, 60
List comprehension, 44
Lists, 38
Logical Expressions, 58
Matplotlib, 126
Methods, 75
Namespace (__builtins__), 85
Namespace (Global), 84
Namespace (Local), 85
Namespace (Resolution), 86
Namespaces, 82
Nuitka, 172
Numba, 165
NumPy, 104
Object Oriented Programming, 63
Objects, 73
Packages, 37
Pandas, 137
Parakeet, 172
Paths, 54
PEP8, 61
Properties, 97
PyPI, 16
PyPy, 172
Pyston, 172
Pythran, 172
Recursion, 102
Runtime Errors, 90
Scientific Libraries, 45
SciPy, 115, 117
Sets, 52
Slicing, 51
syntax and design, 9
Tuples, 50
Type, 73
T HOMAS S ARGENT AND J OHN S TACHURSKI
urllib2, 147
Variable Names, 81
Vectorization, 163
While loop, 40
python, 7
Q
QuantEcon, 27
Installation, 27
Repository, 27
R
Ramsey Problem, 504, 507
Computing, 508
Implementability Multiplier Approach, 509
Kydland-Prescott Approach, 510
Optimal Taxation, 487
Recursive Representation, 512
Time Inconsistency, 517
Rational Expectations Equilibrium, 304
Competitive Equilbrium (w. Adjustment
Costs), 307
Computation, 309
Definition, 306
Planning Problem Approach, 310
Robustness, 415
S
Schelling Segregation Model, 208
scientific programming, 9
Blaze, 16
CVXPY, 16
IPython notebook, 15
Numba, 16
numeric, 10
PyMC, 16
PyTables, 16
scikit-learn, 14
SciPy, 115, 117
Bisection, 121
Fixed Points, 123
Integration, 124
Linear Algebra, 125
Multivariate Root Finding, 123
Newton-Raphson Method, 122
Optimization, 124
Statistics, 118
Smoothing, 474, 478
January 28, 2015
INDEX
543
Spectra, 474
Estimation, 474
Spectra, Estimation
AR(1) Setting, 484
Fast Fourier Transform, 474
Pre-Filtering, 484
Smoothing, 478, 479, 484
Spectral Analysis, 456, 460
Spectral Densities, 461
Spectral Density, 462
interpretation, 462
Inverting the Transformation, 464
Mathematical Theory, 464
Spectral Radius, 188
Static Types, 161
Stationary Distributions, 190, 196
statsmodels, 14
Stochastic Matrices, 191
SymPy, 11
T
Text Editors, 31
U
Unbounded Utility, 265
urllib2, 147
V
Value Function Iteration, 265
Vectorization, 160, 163
Operations on Arrays, 163
Vectors, 173, 174
Inner Product, 175
Linear Independence, 177
Norm, 175
Operations, 174
Span, 176
W
Wakari, 159
White Noise, 457, 461
Wolds Decomposition, 458