Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
158 views

Python For Econometrics

Uploaded by

Eleimon Gonis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views

Python For Econometrics

Uploaded by

Eleimon Gonis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 300

Python for Econometrics

Lecturer: Fabian H. C. Raters

Institute: Econometrics, University of Goettingen

Version: February 14, 2022

© 2022 PyEcon.org. All rights reserved. Python is a trademark of the PSF.


Learning Python for econometrics 2
Essential concepts
Getting started
Procedural Welcome to this course and to the world of Python!
programming
Object-orientation

Numerical Learning objectives of this course:


programming
NumPy package
Array basics Python: The course is about Python programming.
Linear algebra

Data formats and


for : You will learn tools and methods.
handling
Pandas package
Econometrics:
Series Statistics: Numerical programming in Python.
DataFrame
Import/Export data
applied to: We will use it on examples.
Visual illustrations Economics: In an economic context.
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Learning Python for econometrics 3
Essential concepts
Getting started
Procedural Knowledge after completing this course:
programming
Object-orientation

Numerical
You have acquired a basic understanding of programming in general
programming with Python and a special knowledge of working with standard
NumPy package
Array basics numerical packages.
Linear algebra

Data formats and


You are able to study Python in depth and absorb new knowledge
handling for your scientific work with Python.
Pandas package
Series You know the capabilities and further possibilities to use Python
DataFrame
Import/Export data in econometrics.
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Learning Python for econometrics 4
Essential concepts
Getting started
Procedural What you should not expect from this course:
programming
Object-orientation

Numerical
A guide how to install or maintain an application.
programming
NumPy package
An introduction to programming for beginners.
Array basics
Linear algebra
An introduction to professional development tools.
Data formats and
handling
Non-scientific, general purpose programming (beyond the language
Pandas package essentials).
Series
DataFrame Few content and less effort...
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Course organisation 5
Essential concepts
Getting started
Procedural This course can be seen as an applied lecture:
programming
Object-orientation
Lecture:
Numerical
programming We try to explain the partly theoretical knowledge on Python by sim-
NumPy package
Array basics
ple, easy to understand examples. You can learn the programming
Linear algebra language’s subtleties by reading literature.
Data formats and
handling Exercises:
Pandas package
Series
Digital work sheets in the form of Jupyter notebooks with applied
DataFrame tasks are available for each chapter. For all exercises there are sample
Import/Export data

Visual illustrations
solutions available in separate notebooks.
Matplotlib package
Figures and subplots
Self-tests:
Plot types and styles At the end of each of the five chapters there are typical exam questions.
Pandas layers

Applications Written exam:


Time series
There will be a final exam. This will be a pure multiple choice exam:
Moving window
Financial applications 60 questions, 90 minutes.
Optimization
After the successful participation in the exam you will receive 6 ECTS.

© 2022 PyEcon.org
Literature 6
Essential concepts
Getting started
Procedural The programming language Python is already established and very well
programming
Object-orientation in trend for numerical applications. Some keywords:
Numerical
programming Data science,
NumPy package
Array basics Data wrangling,
Linear algebra

Data formats and Machine learning,


handling
Pandas package Numerical statistics,
Series
DataFrame ...
Import/Export data

Visual illustrations Recommended literature while following this course:


Matplotlib package
Figures and subplots Learning Python, 5th Edition by Mark Lutz,
Plot types and styles
Pandas layers
Python Crash Course, 2nd Edition by Eric Matthes,
Applications
Time series Python Data Science Handbook by Jake VanderPlas,
Moving window
Financial applications Python for Data Analysis, 2nd Edition by Wes McKinney,
Optimization
Python for Finance, 2nd Edition by Yves Hilpisch.

© 2022 PyEcon.org
Software: Python 3 7
Essential concepts
Getting started
Procedural We are using Python 3. There was a big revision in the migration
programming
Object-orientation from Python 2 to version 3 and the new version is no longer backwards
Numerical compatible to the old version.
programming
NumPy package
Array basics Python 3 running [command line]
Linear algebra
python --version
Data formats and
handling
Pandas package
Series
## Python 3.9.10
DataFrame
Import/Export data
The normal execution mode is that the Python interpreter processes
Visual illustrations
Matplotlib package the instructions in the background – in other numeric programming
Figures and subplots
Plot types and styles
languages such as R this is known as batch mode. It executes program
Pandas layers code that is usually located in a source code file.
Applications
Time series
The interpreter can also be started in an interactive mode. It is used
Moving window for testing and analytic purposes in order to obtain fast results when
Financial applications
Optimization performing simple applications.

© 2022 PyEcon.org
Software: IDEs 8
Essential concepts
Getting started
Procedural For everyday work with Python it would be extremely tedious to make
programming
Object-orientation all edits in interactive mode.
Numerical
programming
There are a number of excellent integrated development environments
NumPy package (IDEs) for Python, with three being emphasized here:
Array basics
Linear algebra

Data formats and


Jupyter (and IPython)
handling
Pandas package
Spyder (scientific IDE)
Series
DataFrame
PyCharm (by IntelliJ)
Import/Export data

Visual illustrations Of course, you can also use a simple text editor. However, you would
Matplotlib package
Figures and subplots
probably miss the comfort of an IDE.
Plot types and styles
Pandas layers
Installing, adding and maintaining Python is not trivial at the beginning.
Applications Therefore, as a beginner, you are well advised to download and install
Time series the Python distribution Anaconda. Bonus: Many standard packages
Moving window
Financial applications are supplied directly or you can post-install them conveniently.
Optimization

© 2022 PyEcon.org
Following this course 9
Essential concepts
Getting started
Procedural In this course – in a numeric and analytic context – we use only Jupyter
programming
Object-orientation with the IPython kernel.
Numerical
programming
That is why we have combined
NumPy package
Array basics 1 all the code from the slides, and
Linear algebra

Data formats and 2 all the exercises and solutions


handling
Pandas package
Series
into interactive Jupyter notebooks that you can use online without
DataFrame having to install software locally on your computer. The GWDG has
Import/Export data

Visual illustrations
set up a cloud-based Jupyter-Hub for you.
Matplotlib package
Figures and subplots
You can access the working environment with your university credentials
Plot types and styles at
Pandas layers

Applications https://jupyter-cloud.gwdg.de/
Time series
Moving window
create a profile and get started right away – even using your smart
Financial applications
devices. However, so far you are still asked to upload the course
Optimization
notebooks by yourself or rewrite the code from scratch.

© 2022 PyEcon.org
Notebook workflow 10
Essential concepts
Getting started
Procedural A Jupyter notebook is divided into individual, vertically arranged cells,
programming
Object-orientation which can be executed separately:
Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization
The notebook approach is not novel and comes from the field of
computer algebra software.
© 2022 PyEcon.org
Notebook workflow 11
Essential concepts
Getting started
Procedural Actually, an interactive Python interpreter called IPython is started “in
programming
Object-orientation the core”.
Numerical
programming IPython running [command line]
NumPy package
Array basics ipython --version
Linear algebra

Data formats and


handling
## 8.0.1
Pandas package
Series
DataFrame
Roughly speaking, this is a greatly enhanced version of the Python
Import/Export data 3 interpreter, which has numerous, convenient advantages over the
Visual illustrations “normal” interpreter in interactive mode, such as, e.g.,
Matplotlib package
Figures and subplots
printing of return values,
Plot types and styles
Pandas layers
color highlighting, and
Applications
Time series magic commands.
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Following this course 12
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Linear algebra
Finally, we wish you a lot of fun and success with and in this course!
Data formats and
handling
Pandas package
Series
Practice makes perfect!
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
Contribution and credits:
Applications
Time series
Fabian H. C. Raters
Moving window
Financial applications
Eike Manßen
Optimization
GWDG for the Jupyter-Hub

© 2022 PyEcon.org
Table of contents 13
Essential concepts
Getting started
Procedural
programming
Object-orientation
1 Essential concepts 4 Visual illustrations
Numerical 1.1 Getting started 4.1 Matplotlib package
programming
NumPy package
1.2 Procedural programming 4.2 Figures and subplots
Array basics
1.3 Object-orientation 4.3 Plot types and styles
Linear algebra

Data formats and 2 Numerical programming 4.4 Pandas layers


handling
Pandas package 2.1 NumPy package 5 Applications
Series
DataFrame
2.2 Array basics 5.1 Time series
Import/Export data 2.3 Linear algebra 5.2 Moving window
Visual illustrations
3 Data formats and handling 5.3 Financial applications
Matplotlib package
Figures and subplots
3.1 Pandas package 5.4 Optimization
Plot types and styles
Pandas layers 3.2 Series
Applications 3.3 DataFrame
Time series
Moving window 3.4 Import/Export data
Financial applications
Optimization

© 2022 PyEcon.org
Chapter 1 14
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Essential concepts
programming
NumPy package
Array basics
Linear algebra
1.1 Getting started
Data formats and
handling
1.2 Procedural programming
Pandas package
Series 1.3 Object-orientation
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Section 1.1 15
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Essential concepts
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Getting started
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Motivation for learning Python 16
Essential concepts
Getting started
Procedural Python can be described as
programming
Object-orientation

Numerical
a dynamic, strongly typed, multi-paradigm and object-oriented
programming programming language,
NumPy package
Array basics for versatile, powerful, elegant and clear programming,
Linear algebra

Data formats and with a general, high-level, multi-platform application scope,


handling
Pandas package which is being used very successfully in the data science sector
Series
DataFrame
and very much in trend.
Import/Export data

Visual illustrations Moreover, Python is relatively easy to learn and its successful language
Matplotlib package
Figures and subplots
design supports novices to professional developers. Much of Python’s
Plot types and styles success is due to a high degree of standardization and a huge community
Pandas layers
that elaborates and collectively recognizes conventions and paradigms.
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
A short history of time 17
Essential concepts
Getting started
Procedural ... of the Python era:
programming
Object-orientation
The language was originally developed in 1991 by Guido van Rossum.
Numerical
programming Its name was based on Monty Python’s Flying Circus. Its main identifi-
NumPy package
Array basics
cation feature is the novel markup of code blocks – by indentation:
Linear algebra

Data formats and Indentation example


handling
Pandas package password = input("I am your bank. Password please: ")
Series
DataFrame ## I am your bank. Password please: sparkasse
Import/Export data
if password == "sparkasse":
Visual illustrations
Matplotlib package
print("You successfully logged in!")
Figures and subplots else:
Plot types and styles print("Fail. Will call the police!")
Pandas layers

Applications ## You successfully logged in!


Time series
Moving window
Financial applications This increases the readability of code and should at the same time
Optimization
encourage the programmer in programming neatly. Since the source
code can be written more compactly with Python, an increased efficiency
in daily work can be expected.
© 2022 PyEcon.org
A short history of time 18
Essential concepts
Getting started
Procedural Overview of the Python development by versions and dates:
programming
Object-orientation

Numerical
programming
NumPy package
1990 1995 2000 2005 2010 2015 2020 2025
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series Python’s birthday Python 2.0 Python 3.0
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles Python 2.7 lives forever Python 3.9
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization Python 3.6 Python 2.7 will die

© 2022 PyEcon.org
In comparison 19
Essential concepts
Getting started
Procedural Comparing the way Python works with common programming languages,
programming
Object-orientation we briefly discuss a selection of popular competitors:
Numerical
programming C/C++:
NumPy package
Array basics CPython is interpreted, not compiled.
Linear algebra

Data formats and


C/C++ are strongly static, complex languages.
handling
Pandas package
Java:
Series
DataFrame
CPython is not compiled just-in-time.
Import/Export data
Java has a C-type syntax.
Visual illustrations
Matplotlib package MATLAB
Figures and subplots
Plot types and styles
In Python you primarily follow a scalar way of thinking, while in
Pandas layers

Applications
MATLAB you write matrix-based programs.
Time series
Moving window
In the numerical context, the matrix view and syntax are very
Financial applications similar to those of MATLAB.
Optimization

MATLAB is partially compiled just-in-time.


Where CPython is the reference implementation – the “Original Python”,
© 2022 PyEcon.org
which is implemented in C itself.
In comparison 20
Essential concepts
Getting started
Procedural R
programming
Object-orientation In Python you primarily follow a scalar way of thinking, while in R
Numerical
programming
you write vector-based programs.
NumPy package
Array basics
R has a C-type syntax including additions to novel language con-
Linear algebra cepts.
Data formats and
handling Stata
Pandas package
Series Any comparison would inadequately describe the differences.
DataFrame
Import/Export data

Visual illustrations
Reference semantics
Matplotlib package
Figures and subplots
An extremely important difference between the first two languages,
Plot types and styles C/C++ and Java, as well as Python itself, and the last three languages
Pandas layers

Applications
is that they follow a call-by-reference semantic, while MATLAB, R and
Time series Stata are call-by-copy.
Moving window
Financial applications
Optimization Further specific differences and similarities to MATLAB and R will be
addressed in other parts of this course.

© 2022 PyEcon.org
Versatility – diversity 21
Essential concepts
Getting started
Procedural Python has become extremely popular:
programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/
© 2022 PyEcon.org
Versatility – diversity 22
Essential concepts
Getting started
Procedural So, you’re on the right track – because who wants to bet on the wrong
programming
Object-orientation hoRse?
Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/
© 2022 PyEcon.org
Versatility – diversity 23
Essential concepts
Getting started
Procedural Areas in which Python is used with great success:
programming
Object-orientation
Scripts,
Numerical
programming Console applications,
NumPy package
Array basics GUI applications,
Linear algebra
Game development,
Data formats and
handling Website development, and
Pandas package
Series Numerical programming.
DataFrame
Import/Export data Places where Python is used:
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Yet another outline 24
Essential concepts
Getting started
Procedural In this course we will successively gain the following insights:
programming
Object-orientation

Numerical
programming 1 General basics of the language.
NumPy package
Array basics
Linear algebra 2 Numerical programming and handling of data sets.
Data formats and
handling
Pandas package
3 Application to economic and analytical questions.
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Section 1.2 25
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Essential concepts
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Procedural programming
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
The first program 26
Essential concepts
Getting started
Procedural Programs can be implemented very quickly – this is a pretty minimal
programming
Object-orientation example. You can write this command to a text file of your choice and
Numerical run it directly on your system:
programming
NumPy package
Array basics Hello there
Linear algebra
print("Hello there!")
Data formats and
handling
Pandas package
## Hello there!
Series
DataFrame
Import/Export data
Only one function print() (shown here as a keyword),
Visual illustrations
Matplotlib package Function displays argument (a string) on screen,
Figures and subplots
Plot types and styles Arguments are passed to the function in parentheses,
Pandas layers

Applications A string must be wrapped in " " or ’ ’,


Time series
Moving window
No semicolon at the end.
Financial applications
Optimization

© 2022 PyEcon.org
User input 27
Essential concepts
Getting started
Procedural Let’s add a user input to the program:
programming
Object-orientation

Numerical
Hello you
name = input("Please enter your name: ")
programming
NumPy package
Array basics
## Please enter your name: Angela Merkel
Linear algebra

Data formats and print("Hello " + name + "!")


handling
Pandas package
## Hello Angela Merkel!
Series
DataFrame
Import/Export data

Visual illustrations The function input() is used for interactive text input,
Matplotlib package
Figures and subplots You can use the equal sign = to assign variables (here: name),
Plot types and styles
Pandas layers Strings can be joined by the (overloaded) Operator +.
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Determining weekdays 28
Essential concepts
Getting started
Procedural We are now trying to find out on which weekday a person was born
programming
Object-orientation (Merkel’s birthday is 17-07-1954):
Numerical
programming Weekday of birth
NumPy package
Array basics from datetime import datetime
Linear algebra

Data formats and


answer = input("Your birthday (DD-MM-YYYY): ")
handling
Pandas package
## Your birthday (DD-MM-YYYY): 17-07-1954
birthday = datetime.strptime(answer, "%d-%m-%Y")
Series
DataFrame
Import/Export data print("Your birthday was on a " + birthday.strftime("%A") + "!")
Visual illustrations
Matplotlib package ## Your birthday was on a Saturday!
Figures and subplots
Plot types and styles
Pandas layers

Applications
It is really easy to import functionality from other modules,
Time series
Moving window
Function strptime() is a method of class datetime,
Financial applications
Optimization
Both methods, strptime() and strftime(), are used to convert
between strings and date time specifications.

© 2022 PyEcon.org
Time since birth 29
Essential concepts
Getting started
Procedural And how many days have passed since then (until Merkel’s 4th swearing-
programming
Object-orientation in as Federal Chancellor)?
Numerical
programming Age in days
NumPy package
Array basics someday = datetime.strptime("14-03-2018", "%d-%m-%Y")
Linear algebra
print("You are " + str((someday - birthday).days) + " days old!")
Data formats and
handling
Pandas package
## You are 23251 days old!
Series
DataFrame
Import/Export data
You can create time differences, i.e., the operator - is overloaded,
Visual illustrations
Matplotlib package The difference represents a new object, with its own attributes,
such as days,
Figures and subplots
Plot types and styles
Pandas layers

Applications
When using the overloaded operator +, you have to explicitly
Time series convert the number of days by means of str() into a string.
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Time since birth 30
Essential concepts
Getting started
Procedural How many years, weeks and days do you think that is?
programming
Object-orientation

Numerical
Human readable age
programming
from dateutil.relativedelta import relativedelta
NumPy package
Array basics
delta = relativedelta(someday, birthday)
Linear algebra print(f"That’s {delta.years} years, {delta.months} months "
Data formats and f"and {delta.days} days!!")
handling
Pandas package
## That's 63 years, 7 months and 25 days!!
Series
DataFrame
Import/Export data

Visual illustrations You don’t have to keep reinventing the wheel – a wealth of packages
Matplotlib package
Figures and subplots
and individual modules are freely available,
Plot types and styles
Pandas layers
A lowercase f before "..." provides convenient formatting – there
Applications are other options as well,
Two strings in sequence are implicitly joined together – "That"
Time series
Moving window
Financial applications
Optimization
"’s nice"!

© 2022 PyEcon.org
Getting help 31
Essential concepts
Getting started
Procedural When working with the interactive interpreter, i.e., in a notebook, you
programming
Object-orientation can quickly get useful information about Python objects:
Numerical
programming Help system
NumPy package
Array basics help(len)
Linear algebra

Data formats and ## Help on built-in function len in module builtins:


handling
Pandas package
##
Series ## len(obj, /)
DataFrame ## Return the number of items in a container.
Import/Export data

Visual illustrations
Matplotlib package
Alternatively, e.g., for more complex problems, it is best to search
Figures and subplots directly with your preferred internet search engine.
Plot types and styles
Pandas layers You can find neat solutions to conventional challenges in literature.
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Lexical structure 32
Essential concepts
Getting started
Procedural As with natural language, programming languages have a lexical struc-
programming
Object-orientation ture. Source code consists of the smallest possible, indivisible elements,
Numerical the tokens. In Python you can find the following groups of elements:
programming
NumPy package
Array basics Literals
Linear algebra

Data formats and


Variables
handling
Pandas package
Operators
Series
DataFrame
Delimiters
Import/Export data
Keywords
Visual illustrations
Matplotlib package Comments
Figures and subplots
Plot types and styles
Pandas layers These terms give us a rock-solid foundation for exploring the heart of
Applications a programming language.
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Literals and variables 33
Essential concepts
Getting started
Procedural Basically, we distinguish between literals and variables:
programming
Object-orientation

Numerical
Assigning variables with literals
programming
myint = 7
NumPy package
Array basics
myfloat = 4.0
Linear algebra myboat = "nice"
Data formats and mybool = True
handling myfloat = myboat
Pandas package
Series
DataFrame
Import/Export data
In this course, we will work with four different literals: integer (7),
Visual illustrations
Matplotlib package
float (4.0), string ("nice") and boolean (True),
Figures and subplots
Plot types and styles
Literals are assigned to variables at runtime,
Pandas layers
In Python the data type is derived from the literal and does not
Applications
Time series have to be described explicitly,
Moving window
Financial applications It is allowed to assign values of different data types to the same
Optimization
variable (name) sequentially,
If we don’t assign a literal to any variables, we forfeit it.
© 2022 PyEcon.org
Operators and delimiters 34
Essential concepts
Getting started
Procedural Most operators and delimiters will be introduced to you during this
programming
Object-orientation course. Here is an overview of the operators:
Numerical
programming Overview of operators
NumPy package
Array basics ## + - * / ** //
Linear algebra
## % @ << >> & |
Data formats and
handling
## ^ ~ == != < >
Pandas package ## <= >= and or not in
Series ## not in is is not
DataFrame
Import/Export data

Visual illustrations
An overview of the delimiters follows:
Matplotlib package
Figures and subplots Overview of delimiters
Plot types and styles
Pandas layers
## ( ) [ ] { }
## , : . = ; ->
Applications
Time series
## += -= *= /= **= //=
Moving window ## %= @= <<= >>= &= |=
Financial applications ## ^= ' " \ @ SPACE
Optimization

© 2022 PyEcon.org
Arithmetic operators 35
Essential concepts
Getting started
Procedural All regular arithmetic operations involving numbers are possible:
programming
Object-orientation

Numerical
Pocket calculator
programming 10 + 5
NumPy package
100 - 20
Array basics
Linear algebra 8 / 2
Data formats and
4 * (10 + 20)
handling 2**3
## 15
Pandas package
Series
DataFrame ## 80
Import/Export data ## 4.0
Visual illustrations ## 120
Matplotlib package
## 8
Figures and subplots
Plot types and styles
Pandas layers

Applications The result of dividing two integers is a floating point number,


Time series
Moving window The conventional rules apply: Parentheses first, then multiplication
Financial applications
Optimization
and division, etc.,
The operator ** is used for exponentiation.

© 2022 PyEcon.org
Boolean operators 36
Essential concepts
Getting started
Procedural In order to demonstrate the use of logical operators (and formatted
strings and for-loops), we create a handy table summarizing some
programming
Object-orientation

Numerical important results from boolean algebra:


programming
NumPy package
Array basics Logical table
Linear algebra
# Create table head
Data formats and
handling print("a b a and b a or b not a\n"
Pandas package "--------------------------------")
Series
DataFrame
Import/Export data
# Loop through the rows
for a in [False, True]:
Visual illustrations
Matplotlib package
for b in [False, True]:
Figures and subplots print(f"{a:1} {b:3} {a and b:6} {a or b:8} {not a:7}")
Plot types and styles
Pandas layers ## a b a and b a or b not a
Applications ## --------------------------------
Time series ## 0 0 0 0 1
Moving window ## 0 1 0 1 1
Financial applications
Optimization
## 1 0 0 1 0
## 1 1 1 1 0

© 2022 PyEcon.org
Keywords and comments 37
Essential concepts
Getting started
Procedural The programmer explains the structure of his/her program to the
programming
Object-orientation interpreter via a restricted set of short commands, the keywords:
Numerical
programming Overview of keywords
NumPy package
Array basics ## and as assert break class continue
Linear algebra ## def del elif else except False
Data formats and ## finally for from global if import
handling
## in is lambda None nonlocal not
Pandas package
Series
## or pass raise return True try
DataFrame ## while with yield
Import/Export data

Visual illustrations
Matplotlib package
There are two ways to make comments:
Figures and subplots
Plot types and styles Provide some comments
Pandas layers
# Set variable to something - or nothing?
Applications something = None
Time series
Moving window
Financial applications """
Optimization I am a docstring!
A multiline string comment hybrid.
I will be useful for describing classes and methods.
"""
© 2022 PyEcon.org
Data types 38
Essential concepts
Getting started
Procedural Python offers the following basic data types, which we will use in this
programming
Object-orientation course:
Numerical
programming Data type Description
NumPy package
Array basics
int() Integers
Linear algebra float() Floating point numbers
Data formats and
handling
str() Strings, i.e., unicode (UTF-8) texts
Pandas package bool() Boolean, i.e., True or False
list()
Series
DataFrame List, an ordered array of objects
Import/Export data
tuple() Tuple, an ordered, unmutable array of objects
dict()
Visual illustrations
Matplotlib package
Dictionary, an unordered, associative array of objects
Figures and subplots
Plot types and styles
set() Set, an unordered array/set of objects
Pandas layers None() Nothing, emptyness, the void..
Applications
Time series Each data type has its own methods, that is, functions that are appli-
Moving window
Financial applications
cable specifically to an object of this type.
Optimization
You will gradually get to know new and more complex data types or
object classes.

© 2022 PyEcon.org
Lists 39
Essential concepts
Getting started
Procedural A list is an ordered array of objects, accessible via an index:
programming
Object-orientation

Numerical
Listing tech companies
stocks = ["Google", "Amazon", "Facebook", "Apple"]
programming
NumPy package
Array basics stocks[1]
Linear algebra stocks.append("Twitter")
Data formats and stocks.insert(2, "Microsoft")
handling
stocks.sort()
Pandas package
Series ## ['Google', 'Amazon', 'Facebook', 'Apple']
DataFrame
Import/Export data
## Amazon
## ['Google', 'Amazon', 'Facebook', 'Apple', 'Twitter']
Visual illustrations
Matplotlib package
## ['Google', 'Amazon', 'Microsoft', 'Facebook', 'Apple', 'Twitter']
Figures and subplots ## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft', 'Twitter']
Plot types and styles
Pandas layers

Applications
Time series
The constructor for new lists is [ ],
Moving window
Financial applications
The first element has the index 0,
Optimization
The data type list() possesses its own methods.

© 2022 PyEcon.org
Tuples 40
Essential concepts
Getting started
Procedural Tuples are immutable sequences related to lists that cannot be extended,
programming
Object-orientation for example. The drawbacks in flexibility are compensated by the
Numerical advantages in speed and memory usage:
programming
NumPy package
Array basics Selecting elements in sequences
Linear algebra
lottery = (1, 8, 9, 12, 24, 28)
Data formats and
handling len(lottery)
Pandas package lottery[1:3]
Series
lottery[:4]
DataFrame
Import/Export data
lottery[-1]
Visual illustrations
lottery[-2:]
Matplotlib package
## (1, 8, 9, 12, 24, 28)
Figures and subplots
Plot types and styles
## 6
Pandas layers ## (8, 9)
Applications ## (1, 8, 9, 12)
Time series ## 28
Moving window
## (24, 28)
Financial applications
Optimization

The same operations are also supported when using lists.

© 2022 PyEcon.org
Dictionaries 41
Essential concepts
Getting started
Procedural Dictionaries are associative collections of key-value pairs. The key must
programming
Object-orientation be immutable and unique:
Numerical
programming Internet slang dictionary
NumPy package
Array basics slang = {"imho": "in my humble opinion",
Linear algebra
"lol": "laughing out loud",
Data formats and
handling
"tl;dr": "too long; didn’t read"}
Pandas package slang["lol"]
Series slang["gl&hl"] = "good luck & have fun"
DataFrame
slang.keys()
slang.values()
Import/Export data

Visual illustrations
Matplotlib package ## {'imho': 'in...ion', 'lol': 'la...oud', 'tl;dr': 'to...ead'}
Figures and subplots
## laughing out loud
Plot types and styles
Pandas layers
## good luck & have fun
Applications
## dict_keys(['imho', 'lol', 'tl;dr', 'gl&hl'])
Time series ## dict_values([... & have fun'])
Moving window
Financial applications

The constructor for dict() is { } with :,


Optimization

The pairs are unordered, iterable sequences.


© 2022 PyEcon.org
Sets 42
Essential concepts
Getting started
Procedural A set is an unordered collection of objects without duplicates:
programming
Object-orientation

Numerical
Set operations
x = {"o", "n", "y", "t"}
programming
NumPy package
Array basics y = {"p", "h", "o", "n"}
Linear algebra x & y
Data formats and x | y
handling
x - y
Pandas package
Series
## {'y', 'n', 't', 'o'}
## {'p', 'n', 'h', 'o'}
DataFrame
Import/Export data

Visual illustrations
## {'n', 'o'}
Matplotlib package ## {'p', 'y', 'o', 't', 'n', 'h'}
Figures and subplots ## {'y', 't'}
Plot types and styles
Pandas layers

The constructor for set() is { },


Applications
Time series
Moving window
Financial applications
Defines its own operators that overload existing ones.
Empty set via set(), because {} already creates dict().
Optimization

© 2022 PyEcon.org
Comparison operators 43
Essential concepts
Getting started
Procedural The <, <=, >, >=, ==, != operators compare the values of two objects
and return True or False.
programming
Object-orientation

Numerical
programming Op. True, only if the value of the left operand is
NumPy package
Array basics
< less than the value of the right operand
Linear algebra <= less than or equal to the value of the right operand
Data formats and
handling
> greater than the value of the right operand
Pandas package >= greater than or equal to the value of the right operand
Series
DataFrame == equal to the right operand
Import/Export data
!= not equal to the right operand
Visual illustrations
Matplotlib package
Figures and subplots
The comparison depends on the datatype of the objects. For example
Plot types and styles "7" == 7 will return False, while 7.0 == 7 will return True.
Pandas layers

Applications Numbers are compared arithmetically.


Time series
Moving window
Strings are compared lexicographically.
Financial applications
Optimization Tuples and lists are compared lexicographically using comparison
of corresponding elements. This behaviour can be altered.

© 2022 PyEcon.org
Comparison operators 44
Essential concepts
Getting started
Procedural
programming Comparing examples
Object-orientation
x, y = 5, 8
Numerical
programming print("x < y is", x < y)
NumPy package
Array basics ## x < y is True
Linear algebra

Data formats and print("x > y is", x > y)


handling
Pandas package
Series
## x > y is False
DataFrame
Import/Export data print("x == y is", x == y)
Visual illustrations
Matplotlib package ## x == y is False
Figures and subplots
Plot types and styles
print("x != y is", x != y)
Pandas layers

Applications ## x != y is True
Time series

print("This is", "Name" == "Name", "and not", "Name" == "name")


Moving window
Financial applications
Optimization
## This is True and not False

Comparing strings, the case has to be considered.


© 2022 PyEcon.org
Chaining comparison operators 45
Essential concepts
Getting started
Procedural In Python, comparison operators can also be chained.
programming
Object-orientation

Numerical
Chaining comparison examples
programming
x = 5
NumPy package
Array basics
Linear algebra 5 >= x > 4
Data formats and
handling ## True
Pandas package
Series
DataFrame
12 < x < 20
Import/Export data
## False
Visual illustrations
Matplotlib package
Figures and subplots 2 < x < 10
Plot types and styles
Pandas layers ## True
Applications
Time series 2 < x and x < 10 # unchained expression
Moving window
Financial applications
## True
Optimization

The comparison is performed for both sides and combined by and.

© 2022 PyEcon.org
Logical operators 46
Essential concepts

There are three logical operators: not, and, or.


Getting started
Procedural
programming
Object-orientation
Op. Description
Numerical
programming not x Returns True only if x is False
NumPy package
Array basics
x and y Returns True only if x and y are True
Linear algebra
x or y Returns True only if x or y or both are True
Data formats and
handling
Pandas package
Series Logical operators examples
DataFrame
Import/Export data
x, y = 5, 8
Visual illustrations
Matplotlib package (x == 5) and (y == 9)
Figures and subplots
Plot types and styles ## False
Pandas layers

Applications (x == 5) or (y == 8)
Time series
Moving window
Financial applications
## True
Optimization
not(x == 4) or (y == 9)

## True
© 2022 PyEcon.org
Exclusive or 47
Essential concepts

In some situations, you need a logical operation that is True only when
Getting started
Procedural

the operands differ (one is True, the other is False). This task can
programming
Object-orientation

Numerical be solved by using the logical operators not, and, or or simply !=.
programming
NumPy package
Array basics Exclusive or
Linear algebra
x, y = 5, 8
Data formats and
handling
Pandas package
((x == 5) and not (y == 8)) or (not (x == 5) and (y == 8))
Series
DataFrame ## False
Import/Export data

Visual illustrations x = 4
Matplotlib package
((x == 5) and not (y == 8)) or (not (x == 5) and (y == 8))
Figures and subplots
Plot types and styles
Pandas layers
## True
Applications
Time series
(x == 5) != (y == 8)
Moving window
Financial applications ## True
Optimization

In many other programming languages, an operation “exclusive or” or


xor is explicitly part of the language, but not in Python.
© 2022 PyEcon.org
Binary numbers 48
Essential concepts
Getting started
Procedural Bitwise operators operate on numbers, but instead of treating that
programming
Object-orientation number as if it were a single (decimal) value, they operate on the string
Numerical of bits representation, written in binary. A binary number is a number
programming
NumPy package expressed in the base-2 numeral system, also called binary numeral
Array basics
Linear algebra
system, which consists of only two distinct symbols: typically 0 (zero)
Data formats and
and 1 (one).
handling
Pandas package
Series
Binary numbers
DataFrame
## Decimal: Binary:
Import/Export data
## 0: 0
Visual illustrations
Matplotlib package
## 1: 1
Figures and subplots ## 2: 10
Plot types and styles ## 3: 11
Pandas layers
## 4: 100
Applications ## 5: 101
Time series
Moving window
## 6: 110
Financial applications ## 7: 111
Optimization ## 8: 1000
## 9: 1001
## 10: 1010

© 2022 PyEcon.org
Binary numbers 49
Essential concepts
Getting started
Procedural How to convert binary numbers to integers (the unknown keywords and
programming
Object-orientation language structures will be introduced soon):
Numerical
programming Binary to integer
NumPy package
Array basics def bintoint(binary):
Linear algebra
binary = binary[::-1]
Data formats and
handling
num = 0
Pandas package for i in range(len(binary)):
Series num += int(binary[i]) * 2**i
DataFrame
return num
Import/Export data

Visual illustrations
Matplotlib package
bintoint("1101001")
Figures and subplots
Plot types and styles ## 105
Pandas layers

Applications int("1101001", 2) # compare with built-in function


Time series
Moving window ## 105
Financial applications
Optimization

© 2022 PyEcon.org
Binary numbers 50
Essential concepts
Getting started
Procedural How to convert integers to binary numbers:
programming
Object-orientation
Integers to binary
Numerical
programming def inttobin(num):
NumPy package
Array basics
binary = ""
Linear algebra if num != 0:
Data formats and while num >= 1:
handling if num % 2 == 0:
Pandas package
binary += "0"
Series
DataFrame
num = num / 2
Import/Export data else:
Visual illustrations binary += "1"
Matplotlib package num = (num - 1) / 2
Figures and subplots
else:
Plot types and styles
Pandas layers
binary = "0"
Applications
return binary[::-1]
Time series inttobin(105)
Moving window
Financial applications ## '1101001'
Optimization

bin(105)[2:] # compare with built-in function

## '1101001'
© 2022 PyEcon.org
Bitwise operators 51
Essential concepts
Getting started
Procedural Python offers distinct bitwise operators. Some of them will be redefined
programming
Object-orientation entirely different by extensions, such as, e. g., vectorization.
Numerical
programming Bit. op. Description
NumPy package
Array basics
x >> y Returns x with the bits shifted to the left by y places
Linear algebra x << y Returns x with the bits shifted to the right by y places
Data formats and
handling
x&y Does a bitwise and
Pandas package x|y Does a bitwise or
Series
DataFrame ~x Returns the complement of x
Import/Export data
x^y Does a bitwise exclusive or
Visual illustrations
Matplotlib package
Figures and subplots Bitwise operators
Plot types and styles
Pandas layers a, b = 5, 7
Applications
c = a & b # bitwise and
Time series ## a: 101
Moving window
Financial applications
## b: 111
Optimization ## c: 101
print(c)

## 5
© 2022 PyEcon.org
Bitwise operators 52
Essential concepts
Getting started
Procedural
programming Bitwise operators
Object-orientation
a, b = 5, 7
Numerical
programming c = a | b # bitwise or
NumPy package
Array basics
## a: 101
Linear algebra ## b: 111
Data formats and ## c: 111
handling
Pandas package
print(c)
Series
DataFrame ## 7
Import/Export data
a = 13
Visual illustrations
b = a << 2 # bitwise shift
Matplotlib package
Figures and subplots ## a: 1101
Plot types and styles
## b: 110100
Pandas layers

Applications
a, b = 35, 37
Time series c = a ^ b # bitwise exclusive or
Moving window
Financial applications
## a: 100011
Optimization ## b: 100101
## c: 000110

© 2022 PyEcon.org
Control flow: Conditional statements 53
Essential concepts

Python has only one kind of conditional statement – if-elif-else:


Getting started
Procedural
programming
Object-orientation

Numerical
Computer data sizes
bytes = 100000000 / 8 # e.g. DSL 100000
programming
NumPy package
Array basics if bytes >= 1e9:
Linear algebra print(f"{bytes/1e9:6.2f} GByte")
Data formats and elif bytes >= 1e6:
handling
print(f"{bytes/1e6:6.2f} MByte")
Pandas package
Series
elif bytes >= 1e3:
DataFrame print(f"{bytes/1e3:6.2f} KByte")
Import/Export data
else:
Visual illustrations print(f"{bytes:6.2f} Byte")
Matplotlib package
Figures and subplots
Plot types and styles
## 12.50 MByte
Pandas layers

Applications Control flow structures may be nested in any order:


Time series
Moving window
Financial applications
Nestings
if a > 1:
Optimization

if b > 2:
pass # a special keyword for empty blocks
© 2022 PyEcon.org
Control flow: The for loop 54
Essential concepts

In Python there exist two conventional program loops – for-in-else:


Getting started
Procedural
programming
Object-orientation

Numerical
Total sum
programming numbers = [7, 3, 4, 5, 6, 15]
NumPy package
Array basics
y = 0
Linear algebra for i in numbers:
Data formats and y += i
handling print(f"The sum of ’numbers’ is {y}.")
Pandas package
Series
DataFrame
## The sum of 'numbers' is 40.
Import/Export data

Visual illustrations Lists or other collections can also be created dynamically:


Matplotlib package
Figures and subplots
Plot types and styles
Powers of 2
Pandas layers powers = [2 ** i for i in range(11)]
Applications teacher = ["***", "**", "*"]
Time series
grades = {star: len(teacher) - len(star) + 1 for star in teacher}
Moving window
Financial applications
## [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
## {'***': 1, '**': 2, '*': 3}
Optimization

© 2022 PyEcon.org
Control flow: continue and break 55
Essential concepts
Getting started
Procedural Loops can skip iterations (continue):
programming
Object-orientation
Continue the loop
Numerical
programming
NumPy package
for x in ["a", "b", "c"]:
Array basics a = x.upper()
Linear algebra continue
Data formats and print(x)
handling
print(a)
Pandas package
Series
DataFrame
## C
Import/Export data

Visual illustrations Or a loop can be aborted instantly (break):


Matplotlib package
Figures and subplots
Plot types and styles
Breaking the habit
Pandas layers
y = 0
Applications
for i in [7, 3, 4, "x", 6, 15]:
if not isinstance(i, int):
Time series
Moving window
Financial applications break
Optimization y += i
print(f"The total sum is {y}.")

## The total sum is 14.


© 2022 PyEcon.org
Control flow: The while loop 56
Essential concepts
Getting started
Procedural For loops where the number of iterations is not known at the beginning,
you use while-else.
programming
Object-orientation

Numerical
programming
Have you already noticed the keyword else? Python only executes the
NumPy package branch if it was not terminated by break:
Array basics
Linear algebra

Data formats and


Favorite lottery number
handling
import random
Pandas package
Series
n = 0
DataFrame favorite = 7
Import/Export data
while n < 100:
Visual illustrations n += 1
Matplotlib package
draw = random.randint(1, 49) # e.g. German lottery
Figures and subplots
Plot types and styles
if draw == favorite:
Pandas layers print("Got my number! :)")
Applications break
Time series else:
Moving window
print("My favorite did not show up! :(")
Financial applications
Optimization
print(f"I tried {n} times!")
## Got my number! :)
## I tried 10 times!

© 2022 PyEcon.org
Functions 57
Essential concepts

Functions are defined using the keyword def. The structure of function
Getting started
Procedural
programming
Object-orientation signature and body is specified by indentation, too:
Numerical
programming Drawing lottery numbers
NumPy package
Array basics def draw_sample(n, first=1, last=49):
Linear algebra
numbers = list(range(first, last + 1))
Data formats and
handling
sample = []
Pandas package for i in range(n):
Series ind = random.randint(0, len(numbers) - 1)
DataFrame
sample.append(numbers.pop(ind))
sample.sort()
Import/Export data

Visual illustrations
return sample
Matplotlib package
Figures and subplots
draw_sample(6)
Plot types and styles
Pandas layers
draw_sample(6, 80, 100)
Applications
draw_sample(3, first=5)
Time series
## [2, 3, 4, 16, 23, 28]
Moving window
Financial applications
## [82, 84, 94, 95, 99, 100]
Optimization ## [5, 12, 16]

© 2022 PyEcon.org
Functions 58
Essential concepts

Functions are of type callable(), defined as closures, and can be


Getting started
Procedural
programming
Object-orientation created and used like other objects:
Numerical
programming Prime numbers
NumPy package
Array basics
def primes(n):
Linear algebra numbers = [2]
Data formats and
handling def is_prime(num):
Pandas package
Series
for i in numbers:
DataFrame if num % i == 0:
Import/Export data return False
Visual illustrations return True
Matplotlib package if n == 2:
Figures and subplots
Plot types and styles
return numbers
Pandas layers for i in range(3, n + 1):
Applications if is_prime(i):
Time series numbers.append(i)
Moving window return numbers
Financial applications
Optimization
primes(50)

## [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

© 2022 PyEcon.org
Seems weird? We discuss namespaces in the next section.
Section 1.3 59
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Essential concepts
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Object-orientation
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Python is object-oriented 60
Essential concepts
Getting started
Procedural There are three widely known programming paradigms: procedural,
programming
Object-orientation functional and object-oriented programming (OOP). Python supports
Numerical them all.
programming
NumPy package
Array basics
You have learned how to handle predefined data types in Python.
Linear algebra Actually, we have already encountered classes and instances, take for
Data formats and
handling
example dict().
Pandas package
Series
In this section you will learn the basics of dealing with (your own)
DataFrame classes:
Import/Export data

Visual illustrations 1 References


Matplotlib package
Figures and subplots
2 Classes
Plot types and styles
Pandas layers 3 Instances
Applications
Time series
4 Main principles
Moving window
Financial applications
5 Garbage collection
Optimization
OOP is a wide field and challenging for beginners. Don’t get discouraged
and, if you find deficits in yourself, read the literature.

© 2022 PyEcon.org
References 61
Essential concepts
Getting started
Procedural When you assign a variable, a reference to an object is set:
programming
Object-orientation

Numerical
Equal but not identical
a = ["Star", "Trek"]
programming
NumPy package
Array basics b = ["Star", "Trek"]
Linear algebra c = a
Data formats and a == b
handling
a == c
Pandas package
Series
a is b
DataFrame a is c
Import/Export data
## ['Star', 'Trek']
Visual illustrations
Matplotlib package
## ['Star', 'Trek']
Figures and subplots ## ['Star', 'Trek']
Plot types and styles ## True
Pandas layers
## True
Applications ## False
Time series
Moving window
## True
Financial applications
Optimization

Two equal but not identical objects are created,


Variables a and c link to the same object.
© 2022 PyEcon.org
Copying objects 62
Essential concepts
Getting started
Procedural When we introduced lists, we initially did not mention that they are a
programming
Object-orientation first-class example of mutable objects:
Numerical
programming Collecting grades
NumPy package
Array basics grades = [1.7, 1.3, 2.7, 2.0]
Linear algebra
result = grades.append(1.0)
Data formats and result
handling
Pandas package
grades
Series finals = grades
DataFrame finals.remove(2.7)
Import/Export data
finals
Visual illustrations
grades
Matplotlib package
Figures and subplots ## None
Plot types and styles
## [1.7, 1.3, 2.7, 2.0, 1.0]
Pandas layers
## [1.7, 1.3, 2.0, 1.0]
Applications
Time series
## [1.7, 1.3, 2.0, 1.0]
Moving window
Financial applications
Optimization
Modifications can be in-place – the object itself is modified.
Changing an object that is referenced several times could cause
(un)intended consequences.
© 2022 PyEcon.org
Side effects 63
Essential concepts
Getting started
Procedural In Python, arguments are passed by assignment, i.e., call-by-reference:
programming
Object-orientation

Numerical
Side effects
programming def last_element(x):
NumPy package
Array basics
return x.pop(-1)
Linear algebra

Data formats and a = stocks


handling last_element(a)
Pandas package
a
Series
DataFrame ## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft', 'Twitter']
Import/Export data
## Twitter
Visual illustrations ## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft']
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
There are side effects,
Applications
Time series Referenced mutable objects might be modified,
Moving window
Financial applications Referenced immutable objects might be copyied.
Optimization

© 2022 PyEcon.org
Copying objects 64
Essential concepts
Getting started
Procedural We are able to make an exact copy of the object:
programming
Object-orientation

Numerical
Copying
programming
NumPy package
def last_element(x):
Array basics y = x.copy()
Linear algebra return y.pop(-1)
Data formats and
handling
a = stocks
Pandas package
Series
last_element(a)
DataFrame a
Import/Export data
## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft']
Visual illustrations
Matplotlib package
## Microsoft
Figures and subplots ## ['Amazon', 'Apple', 'Facebook', 'Google', 'Microsoft']
Plot types and styles
Pandas layers

Applications
Time series
We receive a new object,
Moving window
Financial applications
The new object is not identical to the old one.
Optimization

© 2022 PyEcon.org
Deep and shallow copying 65
Essential concepts

However, keep in mind that, in most cases, a method copy() will


Getting started
Procedural
programming
Object-orientation create shallow copys while only deep copying will duplicate also the
Numerical contents of a mutable object with a complex structure:
programming
NumPy package
Array basics Cloning fast food
Linear algebra
fastfood = [["burgers", "hot dogs"], ["pizza", "pasta"]]
Data formats and
handling italian = fastfood.copy()
Pandas package italian.pop(0)
Series
american = list(fastfood)
DataFrame
Import/Export data
american.pop(1)
Visual illustrations
american[0] = american[0].copy()
Matplotlib package fastfood[0][1] = "chicken wings"
Figures and subplots fastfood[1][0] = "risotto"
Plot types and styles
italian
american
Pandas layers

Applications
Time series ## [['risotto', 'pasta']]
Moving window ## [['burgers', 'hot dogs']]
Financial applications
Optimization

Both approaches, copy() and list(), create new list objects con-
taining new references to the original sub-lists. But for a deep copy,
© 2022 PyEcon.org
you have to recursively create duplicates of all its objects.
Classes 66
Essential concepts
Getting started
Procedural In Python everything is an object and more complex objects consist of
programming
Object-orientation several other objects.
Numerical
programming In the OOP, we create objects according to patterns. These kinds of
NumPy package
Array basics
blueprints are called classes and are characterized by two categories of
Linear algebra elements:
Data formats and
handling
Pandas package
Attributes:
Series Variables that represent the properties of
DataFrame
Import/Export data an object, object attributes, or
Visual illustrations
Matplotlib package
a class, named class attributes.
Figures and subplots
Plot types and styles
Methods:
Pandas layers
Functions that are defined within a class:
Applications
Time series (non-static) methods can access all attributes, while
Moving window
Financial applications static methods can only access class attributes.
Optimization

Every generated object is an instance of such a construction plan.

© 2022 PyEcon.org
Class definition 67
Essential concepts
Getting started
Procedural Specifically, we want to create “rectangle object” and define a separate
Rectangle class for it:
programming
Object-orientation

Numerical
programming Rectangle class
NumPy package
Array basics class Rectangle:
Linear algebra
width = 0
Data formats and height = 0
handling
Pandas package
Series def area(self):
DataFrame return self.width * self.height
Import/Export data

Visual illustrations
myrectangle = Rectangle()
Matplotlib package
Figures and subplots
myrectangle.width = 10
Plot types and styles myrectangle.height = 20
Pandas layers myrectangle.area()
Applications
Time series ## 200
Moving window
Financial applications
Optimization
New classes are defined using the keyword class,
The variable self always refers to the instance itself.
© 2022 PyEcon.org
Class constructor 68
Essential concepts

We add a constructor (method) __init__(), that is called to initialize


Getting started
Procedural

an object of Rectangle:
programming
Object-orientation

Numerical
programming Rectangle class with constructor
NumPy package
Array basics class Rectangle:
Linear algebra
width = 0
Data formats and height = 0
handling
Pandas package
Series def __init__(self, width, height):
DataFrame self.width = width
self.height = height
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
def area(self):
Plot types and styles return self.width * self.height
Pandas layers myrectangle = Rectangle(15, 30)
Applications myrectangle.area()
Time series
Moving window
## 450
Financial applications
Optimization

In our example, we use the constructor to set the attributes. Methods


with names matching __fun__() have a special, standardized meaning
© 2022 PyEcon.org
in Python.
Class inheritance 69
Essential concepts
Getting started
Procedural One of the most important concepts of OOP is inheritance. A class
programming
Object-orientation inherits all attributes and methods of its parent class and can add new
Numerical or overwrite existing ones:
programming
NumPy package
Array basics Square inherits Rectangle
Linear algebra
class Square(Rectangle):
Data formats and
handling def __init__(self, length):
Pandas package super().__init__(length, length)
Series
DataFrame
Import/Export data
def diagonal(self):
Visual illustrations
return (self.width**2 + self.height**2)**0.5
Matplotlib package mysquare = Square(15)
Figures and subplots
Plot types and styles
print(f"Area: {mysquare.area()}")
Pandas layers print(f"Diagonal length: {mysquare.diagonal():7.4f}")
Applications
## Area: 225
Time series
Moving window
## Diagonal length: 21.2132
Financial applications
Optimization
The methods of the parent class, including the constructor, may be
referenced by super().

© 2022 PyEcon.org
Garbage collection 70
Essential concepts
Getting started
Procedural You do not have to worry about memory management in Python. The
programming
Object-orientation garbage collector will tidy up for you.
Numerical
programming If there are no more references to an object, it is automatically disposed
NumPy package
Array basics
of by the garbage collector:
Linear algebra

Data formats and Garbage collection in action


handling
Pandas package class Dog:
Series def __del__(self):
DataFrame
print("Woof! The dogcatcher got me! Entering the void.. :(")
Import/Export data
# My old dog on a leash
Visual illustrations
Matplotlib package
mydog = Dog()
Figures and subplots # A new dog is born
Plot types and styles newdog = Dog()
Pandas layers
# Using my leash for the new dog
Applications mydog = newdog
Time series

## Woof! The dogcatcher got me! Entering the void.. :(


Moving window
Financial applications
Optimization

The destructor __del__() is executed as the last act before an object


gets deleted.
© 2022 PyEcon.org
Error handling in Python 71
Essential concepts
Getting started
Procedural Everyone involved in programming will encounter errors of various types.
programming
Object-orientation These errors can be stressful and annoying but being aware of the basic
Numerical types of errors that can occur will give you the chance to handle them.
programming
NumPy package
Array basics
Seeing the line SyntaxError may let you think "oh no, I’ve done every-
Linear algebra thing wrong", but errors are normal and even experienced programmers
Data formats and
handling
face them frequently. Hints on error handling:
Pandas package
Series
Dissect the error: Find the line in the error message that is specified.
DataFrame Many errors have messages that are not important to the actual
Import/Export data

Visual illustrations
error. In Python you often find the important information at the
Matplotlib package end of the error message.
Figures and subplots
Plot types and styles Errors are often oversights: In most cases the error massage will
Pandas layers
give you the line in your code where the error occurred.
Applications
Time series
Search the web: If you are not able to fix the errors on your own,
Moving window
Financial applications copy the error message into a search engine and read through
Optimization
the results. Probably someone else also had this problem and the
community already found a solution.

© 2022 PyEcon.org
Exceptions versus syntax errors 72
Essential concepts
Getting started
Procedural A Python program terminates immediately as it encounters an error. In
programming
Object-orientation Python, errors can be either syntax errors or exceptions. Syntax errors
Numerical occur when the parser detects a wrong sequence in the Python code.
programming
NumPy package An arrow indicates the exact position of the syntax error:
Array basics
Linear algebra Syntax Error
Data formats and
handling ## print("Hello Word"))
Pandas package
Series ## File "<stdin>", line 1
DataFrame ## print("Hello World"))
Import/Export data
## ^
Visual illustrations ## SyntaxError: invalid syntax
Matplotlib package
Figures and subplots
Plot types and styles An exception occurs whenever a syntactically correct Python code
Pandas layers
results in an error:
Applications
Time series
Exception
Moving window
Financial applications
a = 0 / 0
Optimization

## <stdin> in <module>()
## ----> 1 a = 0 / 0
## ZeroDivisionError: division by zero
© 2022 PyEcon.org
Exceptions 73
Essential concepts
Getting started
Procedural Exceptions appear in different types and the type is printed as a part
programming
Object-orientation of the error message. The next example shows three common built-in
Numerical exceptions:
programming
NumPy package
Array basics Frequent exception
Linear algebra
0 / 0
Data formats and
handling ## <stdin> in <module>()
## ----> 1 0 / 0
Pandas package
Series
DataFrame ## ZeroDivisionError: division by zero
Import/Export data
3 + a
Visual illustrations
Matplotlib package ## <stdin> in <module>()
Figures and subplots ## ----> 1 3 + a
Plot types and styles
Pandas layers
## NameError: name 'a' is not defined
Applications 3 + "2"
Time series
## <stdin> in <module>()
Moving window
Financial applications
## ----> 1 3 + "2"
Optimization ## TypeError: unsupported operand type(s) for +: 'int' and 'str'

A list of all exception classes of the standard library can be found here.
© 2022 PyEcon.org
Exception handling 74
Essential concepts
Getting started
Procedural When an exception occurs, the Python interpreter throws an error
programming
Object-orientation message and exits. But in most situations, you do not want your whole
Numerical program to stop.
programming
NumPy package
Array basics
The try block can test a block of code for errors.
Linear algebra The except block lets you handle the error.
Data formats and
handling
Pandas package
Series
Try and except
DataFrame
try:
Import/Export data
print(abc)
Visual illustrations
except:
Matplotlib package
Figures and subplots
print("An exception occurred")
Plot types and styles
Pandas layers ## An exception occurred
Applications
Time series
Moving window
The statement above will raise an error, because the variable abc is
Financial applications
not defined.
Optimization

© 2022 PyEcon.org
Exception handling 75
Essential concepts
Getting started
Procedural You can define multiple exception blocks. For example, if you want to
programming
Object-orientation execute code when you expect a special kind of error to occur:
Numerical
programming Multiple exception blocks
NumPy package
Array basics try:
Linear algebra
print(abc)
Data formats and except NameError:
handling
Pandas package
print("Variable abc is not defined")
Series except:
DataFrame print("Something else went wrong")
Import/Export data

Visual illustrations ## Variable abc is not defined


Matplotlib package
Figures and subplots
try:
Plot types and styles 0 / 0
Pandas layers except NameError:
Applications print("Variable abc is not defined")
Time series
except:
print("Something else went wrong")
Moving window
Financial applications
Optimization
## Something else went wrong

© 2022 PyEcon.org
Exception handling 76
Essential concepts

Complementary, like for if-else, the else keyword defines a block of


Getting started
Procedural
programming
Object-orientation code to be executed if no errors were thrown:
Numerical
programming Else exception
NumPy package
Array basics try:
Linear algebra print("Hello World")
Data formats and except:
handling
print("Something went wrong")
Pandas package
Series
else:
DataFrame print("Everything is okay")
Import/Export data

Visual illustrations ## Hello World


Matplotlib package ## Everything is okay
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Exception handling 77
Essential concepts

The finally block will be executed regardless if the try block raises
Getting started
Procedural
programming
Object-orientation an error or not. Hence, you can make sure the code is run:
Numerical
programming Finally exception
NumPy package
Array basics try:
Linear algebra print(abc)
Data formats and except:
handling
print("Something went wrong")
Pandas package
Series
finally:
DataFrame print("This will always be displayed")
Import/Export data

Visual illustrations ## Something went wrong


Matplotlib package ## This will always be displayed
Figures and subplots
Plot types and styles try:
Pandas layers
print("Hello World")
Applications except:
Time series
print("Something went wrong")
Moving window
Financial applications
finally:
Optimization print("This will always be displayed")

## Hello World
## This will always be displayed
© 2022 PyEcon.org
Raise exception 78
Essential concepts
Getting started
Procedural Built-in exceptions are raised whenever pre-defined interpreter errors
programming
Object-orientation occur. In some situations you might want to raise exceptions on your
Numerical own:
programming
NumPy package
Array basics
The raise keyword is used to raise an exception.
Linear algebra

Data formats and


In the following, the interpreter raises an error if the variable x is lower
handling
Pandas package
than 0:
Series
DataFrame Raise exception
Import/Export data
x = -3
Visual illustrations
Matplotlib package
if x < 0:
Figures and subplots raise Exception("Sorry, ’x’ is lower than 0.")
Plot types and styles
Pandas layers
## <stdin> in <module>()
Applications
## ----> 3 raise Exception(Sorry, 'x' is lower than 0.)
Time series ## Exception: Sorry, 'x' is lower than 0.
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
EAFP versus LBYL 79
Essential concepts
Getting started
Procedural LBYL: Look before you leap.
programming
Object-orientation EAFP: It is easier to ask forgiveness than it is to get permission.
Numerical
programming LBYL and EAFP are two techniques to deal (i.e., avoid) with exceptions.
NumPy package
Array basics
In short, in LBYL you first check whether something will succeed and
Linear algebra only proceed if it does. EAFP means that you do what you expect and
Data formats and
handling
if an exception might occur, you deal with it:
Pandas package
Series LBYL
DataFrame
Import/Export data if x != 0:
Visual illustrations print(10 / x)
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
EAFP
Applications
try:
Time series print(10 / x)
Moving window except ZeroDivisionError:
Financial applications
Optimization
pass

© 2022 PyEcon.org
EAFP versus LBYL 80
Essential concepts
Getting started
Procedural So, why use EAFP although it needs more lines of code?
programming
Object-orientation Often, the code is more readable and straight.
Numerical
programming Explicit is better than implicit (Zen of Python, see below).
NumPy package
Array basics Best performance in case no exception is raised.
Linear algebra

Data formats and


Detailed exception handling. You can not only consider errors, but
handling
Pandas package
also different kinds of errors and then proceed differently.
Series
DataFrame
Import/Export data
EAFP
Visual illustrations
try:
Matplotlib package print(10 / x)
Figures and subplots except ZeroDivisionError:
Plot types and styles
print("Zero division")
Pandas layers
except NameError:
Applications
Time series
print("Variable ’x’ is not defined")
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Built-in versus user-defined exceptions 81
Essential concepts
Getting started
Procedural Python has multiple built-in exceptions which terminate your program
programming
Object-orientation when something goes wrong. But you can also create custom exceptions
Numerical that serve specific purposes.
programming
NumPy package
Array basics
Your own exception can implemented by defining a new class which
Linear algebra derives from the Exception class or a subclass:
Data formats and
handling
Pandas package
User-defined exception
Series
DataFrame
class ValueTooLargeError(Exception):
Import/Export data """Raised when the input value is too large"""
Visual illustrations pass
Matplotlib package x = 3
Figures and subplots
try:
if x > 2:
Plot types and styles
Pandas layers

Applications
raise ValueTooLargeError
Time series
except ValueTooLargeError:
Moving window print("The number is too large.")
Financial applications
Optimization
## The number is too large.

© 2022 PyEcon.org
Namespaces 82
Essential concepts
Getting started
Procedural We have already come into contact with namenspaces in Python many
programming
Object-orientation times. These are hierarchically linked layers in which the references to
Numerical objects are defined. A rough distinction is made between
programming
NumPy package
Array basics the global namespace, and
Linear algebra

Data formats and


the local namespace.
handling
Pandas package The global namespace is the outermost environment whose references
Series
DataFrame are known by all objects.
Import/Export data

Visual illustrations
On the other hand, locally defined references are only known in a local,
Matplotlib package i.e., internal environment.
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Namespaces 83
Essential concepts
Getting started
Procedural Reference names from the local namespace mask the same names in
programming
Object-orientation an outer or in the global namespace:
Numerical
programming Namespaces
NumPy package
Array basics def multiplier(x):
Linear algebra
x = 4 * x
Data formats and
handling
return x
Pandas package x = "OH"
Series multiplier("AH")
DataFrame
multiplier(x)
x
Import/Export data

Visual illustrations
Matplotlib package ## OH
Figures and subplots ## AHAHAHAH
Plot types and styles
## OHOHOHOH
## OH
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Namespaces 84
Essential concepts
Getting started
Procedural In fact, functions defined in Python are themselves objects that remem-
programming
Object-orientation ber and can access their own context where they were created. This
Numerical concept comes from functional programming and is called closure:
programming
NumPy package
Array basics Closures
Linear algebra
def gen_multiplier(a):
Data formats and
handling
def fun(x):
Pandas package return a * x
Series return fun
DataFrame
Import/Export data
multi1 = gen_multiplier(4)
multi2 = gen_multiplier(5)
Visual illustrations
Matplotlib package
Figures and subplots multi1
Plot types and styles multi1("EH")
Pandas layers
multi2("EH")
Applications
Time series
## <function gen_multiplier.<locals>.fun at 0x127fc4ee0>
Moving window ## EHEHEHEH
Financial applications ## EHEHEHEHEH
Optimization

© 2022 PyEcon.org
Managing code 85
Essential concepts
Getting started
Procedural In order to provide, maintain and extend modular functionality with
programming
Object-orientation Python, its code containing components can be described hierarchically:
Numerical
programming
NumPy package Packages
Array basics
Linear algebra

Data formats and


Modules
handling
Pandas package Classes
Series
DataFrame
Import/Export data Functions
Visual illustrations
Matplotlib package The organization in Python is very straightforward and is based on the
Figures and subplots
Plot types and styles
local namespaces mentioned before.
Pandas layers
When you download and use new packages, such as NumPy for numer-
Applications
Time series
ical programming in the next chapter, the packages are loaded and the
Moving window namespaces initialized.
Financial applications
Optimization The development of custom packages is an advanced topic and not
essential for a reasonable code structure of small projects, as it is in
other programming languages.
© 2022 PyEcon.org
Importing modules 86
Essential concepts
Getting started
Procedural Modules provide classes and functions via namespaces. It is Python
programming
Object-orientation code that is executed in a local namespace and whose classes and
Numerical functions you can import. Basically, there are the following alternatives
programming
NumPy package how to import from an module:
Array basics
Linear algebra
Import statements
Data formats and
handling import datetime
Pandas package
import datetime as dt
from datetime import date, timedelta
Series
DataFrame
Import/Export data from datetime import *
Visual illustrations
Matplotlib package dt.date.today()
Figures and subplots
dt.timedelta.days
Plot types and styles
Pandas layers

Applications
date.today()
Time series
timedelta.days
Moving window
Financial applications
datetime.now()
Optimization

In the latter case, all classes and functions, but no instances, are
imported from the datetime namespace.
© 2022 PyEcon.org
Build-in modules 87
Essential concepts
Getting started
Procedural A Python installation ships with a standard library consisting of built-
programming
Object-orientation in modules. These modules provide standardized solutions for many
Numerical problems that occur in everyday programming - “batteries included”.
programming
NumPy package For example, they provide access to system functionality such as file
Array basics
Linear algebra
management. The Python Docs give an overview of all build-in modules.
Data formats and
handling Usage of build-in modules
Pandas package
Series
import math
DataFrame from random import randint
Import/Export data

Visual illustrations math.pi


Matplotlib package
Figures and subplots
## 3.141592653589793
Plot types and styles
Pandas layers
math.factorial(5)
Applications
Time series
Moving window ## 120
Financial applications
Optimization randint(10, 20)

## 18

© 2022 PyEcon.org
Installing modules 88
Essential concepts
Getting started
Procedural Often you might want to use extended functionality. Python has a large
programming
Object-orientation and active community of users who make their developments publicly
Numerical available under open source license terms. Packages are containers of
programming
NumPy package modules which can be imported and used within your Python code.
Array basics
Linear algebra These third-party packages can be installed comfortably by using the
Data formats and (command line) package manager pip. The Python Package Index
handling
Pandas package provides an overview of the thousands of packages available. Basic
Series
DataFrame
commands for maintaining, for example, the installation of the package
Import/Export data “numpy”:
Visual illustrations
Matplotlib package
Installing the package: pip install numpy
Figures and subplots
Plot types and styles
Upgrading the package: pip install --upgrade numpy
Pandas layers
Installing the package locally for the current user:
Applications
Time series pip install --user numpy
Moving window
Financial applications Uninstalling the package: pip uninstall numpy
Optimization

© 2022 PyEcon.org
Installing modules 89
Essential concepts
Getting started
Procedural Example: OpenCV is a package for image processing in Python. Here
programming
Object-orientation you can see how the installation proceeds in a Unix terminal.
Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Writing modules 90
Essential concepts
Getting started
Procedural Your Python projects will become complex and you will need to main-
programming
Object-orientation tain the codes properly. Therefore, one can break a large, unwieldy
Numerical programming task into separate, more manageable modules. Modules
programming
NumPy package can be written in Python itself or in C, but here we keep focussing on
Array basics
Linear algebra
the Python language.
Data formats and Creating modules in Python is very straightforward - a Python module
handling
Pandas package is a file containing Python code, for example:
Series
DataFrame
Import/Export data
s = "Hello world!"
Visual illustrations
Matplotlib package
l = [1, 2, 3, 5, 5]
Figures and subplots

def add_one(n):
Plot types and styles
Pandas layers

Applications return n + 1
Time series
Moving window
Financial applications File: mymodule.py
Optimization

© 2022 PyEcon.org
Working with modules 91
Essential concepts
Getting started
Procedural If you import the module mymodule, the interpreter looks in the
programming
Object-orientation current working directory for a file mymodule.py, reads and interprets
Numerical its contents and makes its namespace available:
programming
NumPy package
Array basics Usage of own modules
Linear algebra
import mymodule
Data formats and
handling mymodule.s
Pandas package mymodule.l
Series mymodule.add_one(5)
DataFrame
Import/Export data ## Hello world!
Visual illustrations ## [1, 2, 3, 5, 5]
Matplotlib package ## 6
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Python packages 92
Essential concepts
Getting started
Procedural Large projects could require more than one module. Packages allow
programming
Object-orientation to structure the modules and their namespaces hierarchically by using
Numerical the dot notation. They are simple folders containing modules and
programming
NumPy package (sub-)packages. Consider the following structure:
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
The directory mypackage contains two modules which we can import
Figures and subplots
Plot types and styles separately:
Pandas layers

Applications Usage of own package


Time series
Moving window import mypackage.mymodule
Financial applications import mypackage.somemodule
mypackage.mymodule.add_one(4)
Optimization

## 5

© 2022 PyEcon.org
Package initialization 93
Essential concepts
Getting started
Procedural If a package directory contains a file __init__.py, its code is invoked
programming
Object-orientation when the package gets imported. The directory mypackage, now,
Numerical contains the two modules and the initialization file:
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles The file __init__.py can be empty but can also be used for package
Pandas layers
initialization purposes.
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
The Zen of Python 94
Essential concepts
Getting started
Procedural
programming The Zen of Python
Object-orientation
import this
Numerical
programming
## The Zen of Python, by Tim Peters
##
NumPy package
Array basics
Linear algebra ##
Data formats and ## Beautiful is better than ugly.
handling ## Explicit is better than implicit.
Pandas package
Series
## Simple is better than complex.
DataFrame ## Complex is better than complicated.
Import/Export data ## Flat is better than nested.
Visual illustrations ## Sparse is better than dense.
Matplotlib package ## Readability counts.
Figures and subplots
Plot types and styles
## Special cases aren't special enough to break the rules.
Pandas layers ## Although practicality beats purity.
Applications ## Errors should never pass silently.
Time series ## Unless explicitly silenced.
Moving window ## In the face of ambiguity, refuse the temptation to guess.
Financial applications
Optimization
## ...

© 2022 PyEcon.org
Further topics 95
Essential concepts
Getting started
Procedural A selection of exciting topics that are among the advanced basics but
programming
Object-orientation are not covered in this lecture:
Numerical
programming Dynamic language concepts, such as duck typing,
NumPy package
Array basics Further, complex type classes, such as ChainMap or OrderedDict,
Linear algebra

Data formats and Iterators and generators in detail,


handling
Pandas package Exception handling, raising exceptions, catching errors,
Series
DataFrame Debugging, introspection and annotations.
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Chapter 2 96
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Numerical programming
programming
NumPy package
Array basics
Linear algebra
2.1 NumPy package
Data formats and
handling
2.2 Array basics
Pandas package
Series 2.3 Linear algebra
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Section 2.1 97
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Numerical programming
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I NumPy package
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
The NumPy package 98
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame The Numerical Python package NumPy provides efficient tools for sci-
Import/Export data
entific computing and data analysis:
Visual illustrations
Matplotlib package np.array(): Multidimensional array capable of doing fast and
Figures and subplots
Plot types and styles efficient computations,
Pandas layers

Applications
Built-in mathematical functions on arrays without writing loops,
Time series
Moving window
Built-in linear algebra functions.
Financial applications
Optimization
Import NumPy
import numpy as np

© 2022 PyEcon.org
Motivation 99
Essential concepts
Getting started
Procedural
programming Element-wise addition
vec1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
Object-orientation

Numerical
programming vec2 = np.array(vec1)
NumPy package vec1 + vec1
Array basics
Linear algebra
## [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Data formats and
handling
vec2 + vec2
Pandas package
Series
DataFrame ## array([ 2, 4, 6, 8, 10, 12, 14, 16, 18])
Import/Export data

Visual illustrations for i in range(len(vec1)):


Matplotlib package vec1[i] += vec1[i]
Figures and subplots
vec1
Plot types and styles
Pandas layers
## [2, 4, 6, 8, 10, 12, 14, 16, 18]
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Motivation 100
Essential concepts
Getting started
Procedural
programming Matrix multiplication
Object-orientation

Numerical
mat1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
programming mat2 = np.array(mat1)
NumPy package np.dot(mat2, mat2)
Array basics

## array([[ 30, 36, 42],


Linear algebra

Data formats and


handling
## [ 66, 81, 96],
Pandas package ## [102, 126, 150]])
Series
DataFrame mat3 = np.zeros([3, 3])
for i in range(3):
Import/Export data

Visual illustrations
for k in range(3):
Matplotlib package
Figures and subplots
for j in range(3):
Plot types and styles mat3[i][k] = mat3[i][k] + mat1[i][j] * mat1[j][k]
Pandas layers mat3
Applications
Time series ## array([[ 30., 36., 42.],
Moving window
## [ 66., 81., 96.],
Financial applications
Optimization ## [102., 126., 150.]])

© 2022 PyEcon.org
Motivation 101
Essential concepts
Getting started
Procedural
programming Time comparison
Object-orientation
import time
Numerical
programming mat1 = np.random.rand(50, 50)
NumPy package mat2 = np.array(mat1)
Array basics
t = time.time()
Linear algebra
mat3 = np.dot(mat2, mat2)
nptime = time.time() - t
Data formats and
handling
Pandas package mat3 = np.zeros([50, 50])
Series
t = time.time()
for i in range(50):
DataFrame
Import/Export data
for k in range(50):
Visual illustrations
Matplotlib package
for j in range(50):
Figures and subplots mat3[i][k] = mat3[i][k] + mat1[i][j] * mat1[j][k]
Plot types and styles pytime = time.time() - t
times = str(pytime / nptime)
Pandas layers

Applications
print("NumPy is " + times + " times faster!")
Time series
Moving window
Financial applications ## NumPy is 19.49091343854615 times faster!
Optimization

© 2022 PyEcon.org
Section 2.2 102
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Numerical programming
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Array basics
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Creating NumPy arrays 103
Essential concepts

np.array(list): Converts python list into NumPy arrays.


Getting started
Procedural

array.ndim: Returns Dimension of the array.


programming
Object-orientation

Numerical array.shape: Returns shape of the array as a list.


programming
NumPy package
Array basics Creation
arr1 = [4, 8, 2]
Linear algebra

Data formats and


handling
arr1 = np.array(arr1)
Pandas package arr2 = np.array([24.3, 0., 8.9, 4.4, 1.65, 45])
Series arr3 = np.array([[4, 8, 5], [9, 3, 4], [1, 0, 6]])
DataFrame
arr1.ndim
Import/Export data

Visual illustrations
## 1
Matplotlib package
Figures and subplots
Plot types and styles
arr3.shape
Pandas layers

Applications ## (3, 3)
Time series
Moving window

From now on, the name array refers to an np.array().


Financial applications
Optimization

© 2022 PyEcon.org
Array creation functions 104
Essential concepts

np.arange(start, stop, step): Creates vector of values from start


Getting started
Procedural

to stop with step width step.


programming
Object-orientation

Numerical np.zeros((rows, columns)): Creates array with all values set to 0.


programming
NumPy package np.identity(n): Creates identity matrix of dimension n.
Array basics
Linear algebra
Creation functions
Data formats and
handling np.zeros((4, 3))
Pandas package
Series
## array([[0., 0., 0.],
## [0., 0., 0.],
DataFrame
Import/Export data
## [0., 0., 0.],
Visual illustrations
Matplotlib package
## [0., 0., 0.]])
Figures and subplots
Plot types and styles np.arange(6)
Pandas layers

Applications ## array([0, 1, 2, 3, 4, 5])


Time series
Moving window
np.identity(3)
Financial applications
Optimization
## array([[1., 0., 0.],
## [0., 1., 0.],
## [0., 0., 1.]])
© 2022 PyEcon.org
Array creation functions 105
Essential concepts

np.linspace(start, stop, n): Creates vector of n evenly divided


Getting started
Procedural

values from start to stop.


programming
Object-orientation

Numerical np.full((row, column), k): Creates array with all values set to k.
programming
NumPy package
Array basics Array creation
Linear algebra
np.linspace(0, 80, 5)
Data formats and
handling
Pandas package ## array([ 0., 20., 40., 60., 80.])
Series
DataFrame
np.full((5, 4), 7)
Import/Export data

Visual illustrations ## array([[7, 7, 7, 7],


Matplotlib package
Figures and subplots
## [7, 7, 7, 7],
Plot types and styles ## [7, 7, 7, 7],
Pandas layers ## [7, 7, 7, 7],
Applications ## [7, 7, 7, 7]])
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Array creation functions 106
Essential concepts

np.random.rand(rows, columns): Creates array of random floats


Getting started
Procedural
programming
Object-orientation between zero and one.
Numerical np.random.randint(k, size=(rows, columns)): Creates array of
programming
NumPy package random integers between 0 and k-1.
Array basics
Linear algebra
Array of random numbers
Data formats and
handling np.random.rand(3, 3)
Pandas package
Series
DataFrame
## array([[0.01014591, 0.55955228, 0.48103055],
Import/Export data ## [0.30368877, 0.99078572, 0.61537046],
Visual illustrations
## [0.83572553, 0.45976471, 0.63241975]])
Matplotlib package
Figures and subplots np.random.randint(10, size=(5, 4))
Plot types and styles
Pandas layers
## array([[7, 9, 7, 8],
Applications ## [0, 6, 7, 5],
Time series
Moving window
## [7, 3, 4, 7],
Financial applications ## [9, 4, 4, 8],
Optimization ## [8, 0, 6, 1]])

© 2022 PyEcon.org
Copy arrays 107
Essential concepts
Getting started
Procedural
programming Reference
arr3
Object-orientation

Numerical
programming
NumPy package
## array([[4, 8, 5],
Array basics ## [9, 3, 4],
Linear algebra ## [1, 0, 6]])
Data formats and
handling
arr = arr3
arr[1, 1] = 777
Pandas package
Series
DataFrame arr3
Import/Export data

Visual illustrations ## array([[ 4, 8, 5],


Matplotlib package ## [ 9, 777, 4],
Figures and subplots
## [ 1, 0, 6]])
Plot types and styles
Pandas layers
arr3[1, 1] = 3
Applications
Time series
Moving window
Financial applications call-by-reference
Optimization

arr = arr3 binds arr to the existing arr3. They both refer to the
same object.
© 2022 PyEcon.org
Copy array 108
Essential concepts

array.copy(): Copies an array without reference (call-by-value).


Getting started
Procedural
programming
Object-orientation

Numerical
programming Copy Reference
NumPy package
Array basics arr3 arr3
Linear algebra

Data formats and ## array([[4, 8, 5], ## array([[4, 8, 5],


handling ## [9, 3, 4], ## [9, 3, 4],
Pandas package
## [1, 0, 6]]) ## [1, 0, 6]])
Series

arr = arr3
DataFrame
Import/Export data arr = arr3.copy()
arr[1, 1] = 777 arr[1, 1] = 777
Visual illustrations
Matplotlib package arr3 arr3
Figures and subplots
Plot types and styles
## array([[4, 8, 5], ## array([[ 4, 8, 5],
Pandas layers
## [9, 3, 4], ## [ 9, 777, 4],
Applications
## [1, 0, 6]]) ## [ 1, 0, 6]])
Time series

arr3[1, 1] = 3
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Overview: Array creation functions 109
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical Function Description


programming
NumPy package
array Convert input array in NumPy array
Array basics
arange(start,stop,step) Creates array from given input
Linear algebra

Data formats and


ones Creates array containing only ones
handling
Pandas package
zeros Creates array containing only zeros
Series empty Allocating memory without specific values
DataFrame
Import/Export data eye, identity Creates N x N identity matrix
Visual illustrations linspace Creates array of evenly divided values
Matplotlib package
Figures and subplots
full Creates array with values set to one number
Plot types and styles
Pandas layers
random.rand Creates array of random floats
Applications
random.randint Creates array of random int
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Data types of arrays 110
Essential concepts

array.dtype: Returns the type of array.


Getting started
Procedural

array.astype(np.type): Conducts a manual typecast.


programming
Object-orientation

Numerical
programming Data types
NumPy package
Array basics arr1.dtype
Linear algebra

Data formats and ## dtype('int64')


handling

arr2.dtype
Pandas package
Series
DataFrame
Import/Export data ## dtype('float64')
Visual illustrations
Matplotlib package arr1 = arr1 * 2.5
Figures and subplots arr1.dtype
Plot types and styles
Pandas layers
## dtype('float64')
Applications
Time series
arr1 = (arr1 / 2.5).astype(np.int64)
Moving window
Financial applications
arr1.dtype
Optimization

## dtype('int64')

© 2022 PyEcon.org
Array operations 111
Essential concepts
Getting started
Procedural
programming Element-wise operations
Object-orientation

Numerical Calculation operators on NumPy arrays operate element-wise.


programming
NumPy package
Array basics
Linear algebra
Element-wise operations
Data formats and arr3
handling
Pandas package
## array([[4, 8, 5],
## [9, 3, 4],
Series
DataFrame
Import/Export data ## [1, 0, 6]])
Visual illustrations
Matplotlib package arr3 + arr3
Figures and subplots
Plot types and styles ## array([[ 8, 16, 10],
Pandas layers
## [18, 6, 8],
Applications
## [ 2, 0, 12]])
Time series
Moving window
Financial applications
arr3**2
Optimization
## array([[16, 64, 25],
## [81, 9, 16],
## [ 1, 0, 36]])
© 2022 PyEcon.org
Array operations 112
Essential concepts
Getting started
Procedural
programming Matrix multiplication
Object-orientation

Numerical Operator * applied on arrays does not do the matrix multiplication.


programming
NumPy package
Array basics
Linear algebra
Element-wise operations
Data formats and arr3 * arr3
handling
Pandas package
## array([[16, 64, 25],
Series
DataFrame
## [81, 9, 16],
Import/Export data ## [ 1, 0, 36]])
Visual illustrations
Matplotlib package arr = np.ones((3, 2))
Figures and subplots arr
Plot types and styles
Pandas layers
## array([[1., 1.],
Applications
## [1., 1.],
## [1., 1.]])
Time series
Moving window
Financial applications
arr3 * arr # not defined for element-wise multiplication
Optimization

## ValueError: operands could not be broadcast together

© 2022 PyEcon.org
Integer indexing 113
Essential concepts

array[index]: Selects the value at position index from the data.


Getting started
Procedural
programming
Object-orientation

Numerical
Indexing with an integer
programming
NumPy package
arr = np.arange(10)
Array basics arr
Linear algebra

Data formats and ## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


handling
Pandas package
arr[4]
Series
DataFrame
Import/Export data ## 4
Visual illustrations
Matplotlib package
arr[-1]
Figures and subplots
Plot types and styles ## 9
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Slicing 114
Essential concepts

array[start : stop : step]: Selects a subset of the data.


Getting started
Procedural
programming
Object-orientation

Numerical
Slicing in one dimension
programming
NumPy package
arr = np.arange(10)
Array basics arr
Linear algebra

Data formats and ## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


handling
Pandas package
arr[3:7]
Series
DataFrame
Import/Export data ## array([3, 4, 5, 6])
Visual illustrations
Matplotlib package
arr[1:]
Figures and subplots
Plot types and styles ## array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Slicing 115
Essential concepts
Getting started
Procedural
programming Slicing in one dimension with steps
Object-orientation

Numerical
arr[:7]
programming
NumPy package ## array([0, 1, 2, 3, 4, 5, 6])
Array basics
Linear algebra
arr[-3:]
Data formats and
handling
Pandas package
## array([7, 8, 9])
Series
DataFrame arr[::-1]
Import/Export data

Visual illustrations ## array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])


Matplotlib package
Figures and subplots arr[::2]
Plot types and styles
Pandas layers
## array([0, 2, 4, 6, 8])
Applications
Time series
arr[:5:-1]
Moving window
Financial applications
Optimization ## array([9, 8, 7, 6])

© 2022 PyEcon.org
Slicing 116
Essential concepts
Getting started
Procedural
programming Slicing in higher dimensions
Object-orientation

Numerical In n-dimensional arrays the element at each index is an


programming
NumPy package
(n − 1)-dimensional array.
Array basics
Linear algebra

Data formats and


Indexing rows
handling
Pandas package
arr3
Series
DataFrame ## array([[4, 8, 5],
Import/Export data
## [9, 3, 4],
Visual illustrations ## [1, 0, 6]])
Matplotlib package

vec = arr3[1]
Figures and subplots
Plot types and styles
Pandas layers vec
Applications
Time series ## array([9, 3, 4])
Moving window
Financial applications arr3[-1]
Optimization

## array([1, 0, 6])

© 2022 PyEcon.org
Slicing 117
Essential concepts
Getting started
Procedural
programming Slicing in two dimensions
Object-orientation
arr3
Numerical
programming
NumPy package ## array([[4, 8, 5],
Array basics ## [9, 3, 4],
Linear algebra
## [1, 0, 6]])
Data formats and
handling
Pandas package
arr3[0:2, 0:2]
Series
DataFrame ## array([[4, 8],
Import/Export data ## [9, 3]])
Visual illustrations
Matplotlib package arr3[2:, :]
Figures and subplots
Plot types and styles
Pandas layers
## array([[1, 0, 6]])
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Slicing 118
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

Figure: Python for Data Analysis (2017) on page 99

© 2022 PyEcon.org
Views on arrays 119
Essential concepts
Getting started
Procedural So far, selecting by index numbers or slicing belongs to basic indexing
programming
Object-orientation in NumPy. With basic indexing you get NO COPY of your data but a
Numerical so-called view on the existing data set – a different perspective.
programming
NumPy package A view on an array can be seen as a reference to a rectangular memory
Array basics
Linear algebra
area of its values. The view is intended to
Data formats and edit a rectangular part of a matrix, e.g., a sub-matrix, a column,
handling
Pandas package or a single value,
Series
DataFrame change the shape of the matrix or the arrangement of its elements,
Import/Export data
e.g., transpose or reshape a matrix,
Visual illustrations
Matplotlib package change the visual representation of values, e.g., to cast a float
Figures and subplots
Plot types and styles
array into an int array,
Pandas layers
map the values in other program areas.
Applications
Time series
The crucial point here is that for efficiency reasons data arrays in your
Moving window
Financial applications working memory do not have to be copied again and again for simple
Optimization
index operations, which would require an excessive additional effort
writing to the computer memory.

© 2022 PyEcon.org
Creating views implicitly 120
Essential concepts
Getting started
Procedural A view is created automatically when you do basic indexing such as
programming
Object-orientation slicing:
Numerical
programming Create a view by slicing
NumPy package
Array basics column = arr3[:, 1]
Linear algebra
column
Data formats and
handling
Pandas package
## array([8, 3, 0])
Series
DataFrame column.base
Import/Export data

Visual illustrations ## array([[4, 8, 5],


Matplotlib package ## [9, 3, 4],
Figures and subplots
## [1, 0, 6]])
Plot types and styles
Pandas layers
column[1] = 100
Applications
Time series
arr3
Moving window
Financial applications ## array([[ 4, 8, 5],
Optimization
## [ 9, 100, 4],
## [ 1, 0, 6]])

© 2022 PyEcon.org
Creating views implicitly 121
Essential concepts
Getting started
Procedural
programming Create a view by slicing
Object-orientation

Numerical
elem = column[1:2]
programming elem.base
NumPy package
Array basics
## array([[ 4, 8, 5],
## [ 9, 100, 4],
Linear algebra

Data formats and


handling
## [ 1, 0, 6]])
Pandas package
Series elem[0] = 3
DataFrame arr3
Import/Export data

Visual illustrations ## array([[4, 8, 5],


Matplotlib package
Figures and subplots
## [9, 3, 4],
Plot types and styles ## [1, 0, 6]])
Pandas layers

Applications
Time series
The middle column is a view of the base array referenced by arr3,
Moving window
Financial applications
Optimization
Any changes to the values of a view directly affect the base data,
A view of a view is another view on the same base matrix.

© 2022 PyEcon.org
Obtaining views explicitly 122
Essential concepts
Getting started
Procedural In addition, an array contains methods and attributes that return a
programming
Object-orientation view of its data:
Numerical
programming Obtain a view
NumPy package
Array basics
arr3_t = arr3.T
Linear algebra arr3_t
Data formats and
handling ## array([[4, 9, 1],
Pandas package ## [8, 3, 0],
Series
DataFrame
## [5, 4, 6]])
Import/Export data
arr3_t.flags.owndata
Visual illustrations
Matplotlib package
Figures and subplots ## False
Plot types and styles
Pandas layers arr3_r = arr3.reshape(1, 9)
Applications arr3_r
Time series
Moving window
## array([[4, 8, 5, 9, 3, 4, 1, 0, 6]])
Financial applications
Optimization
arr3_t.flags.owndata

## False
© 2022 PyEcon.org
Obtaining views explicitly 123
Essential concepts
Getting started
Procedural
programming Obtain a view
arr3_v = arr3.view()
Object-orientation

Numerical
programming arr3_v.flags.owndata
NumPy package
Array basics ## False
Linear algebra

Data formats and


handling
Pandas package
The transposed matrix is a predefined view that is available as an
Series attribute,
DataFrame
Import/Export data Reshaping is also just another way of looking at the same set of
Visual illustrations
Matplotlib package
data,
Figures and subplots
Plot types and styles
By means of the method view() you create a view with an identical
Pandas layers representation.
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Fancy indexing 124
Essential concepts
Getting started
Procedural The behavior described above changes with advanced indexing, i. e., if
programming
Object-orientation at least one component of the index tuple is not a scalar index number
Numerical or slice. The case of fancy indexing is described below:
programming
NumPy package
Array basics Advanced and basic indexing
Linear algebra
arr3
Data formats and
handling
Pandas package ## array([[4, 8, 5],
Series ## [9, 3, 4],
DataFrame
## [1, 0, 6]])
Import/Export data

Visual illustrations arr = arr3[[0, 2], [0, 2]]


arr
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers ## array([4, 6])
Applications
Time series arr.base
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Fancy indexing 125
Essential concepts
Getting started
Procedural
programming Advanced and basic indexing
Object-orientation

Numerical
arr = arr3[0:3:2, 0:3:2]
programming arr
NumPy package
Array basics
## array([[4, 5],
## [1, 6]])
Linear algebra

Data formats and


handling
Pandas package
arr.base
Series
DataFrame ## array([[4, 8, 5],
Import/Export data
## [9, 3, 4],
Visual illustrations ## [1, 0, 6]])
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers Contrary to intuition, fancy indexing does not return a (2 × 2)-
Applications
Time series
matrix, but a vector of the matrix elements (0, 0) and (2, 2). This
Moving window is a complete copy – a new object and not a view to the original
Financial applications
Optimization
matrix.
A submatrix (view) with the corner elements of the initial matrix
can be obtained with slicing.
© 2022 PyEcon.org
Boolean arrays 126
Essential concepts

A boolean array is a NumPy array with boolean True and False values.
Getting started
Procedural
programming
Object-orientation Such an array can be created by applying a comparison operator on
Numerical NumPy arrays.
programming
NumPy package
Array basics Boolean arrays
Linear algebra
bool_arr = (arr3 < 5)
Data formats and
handling bool_arr
Pandas package
Series ## array([[ True, False, False],
DataFrame
Import/Export data
## [False, True, True],
## [ True, True, False]])
Visual illustrations

bool_arr1 = (arr3 == 0)
Matplotlib package
Figures and subplots
Plot types and styles bool_arr1
Pandas layers

Applications ## array([[False, False, False],


Time series ## [False, False, False],
Moving window
## [False, True, False]])
Financial applications
Optimization

The comparison operators on arrays can be combined by means of


NumPy redefined bitwise operators.
© 2022 PyEcon.org
Boolean arrays 127
Essential concepts
Getting started
Procedural
programming Boolean arrays and bitwise operators
Object-orientation

Numerical
a = np.array([3, 8, 4, 1, 9, 5, 2])
programming b = np.array([2, 3, 5, 6, 11, 15, 17])
NumPy package c = (a % 2 == 0) | (b % 3 == 0) # or
Array basics
Linear algebra
c
Data formats and
handling
## array([False, True, True, True, False, True, True])
Pandas package
Series d = (a > b) ^ (a % 2 == 1) # exclusive or
DataFrame d
Import/Export data

Visual illustrations ## array([False, True, False, True, True, True, False])


Matplotlib package

c ^ d
Figures and subplots
Plot types and styles
# exclusive or
Pandas layers

Applications
## array([False, False, True, False, True, False, True])
Time series
Moving window
Financial applications Boolean arrays
Optimization

Logical operations on NumPy arrays work in a similar way compared


to bitwise operators.
© 2022 PyEcon.org
Indexing with boolean arrays 128
Essential concepts
Getting started
Procedural Boolean arrays can be used to select elements of other NumPy arrays.
If x is an array and y is a boolean array of the same dimension, then
programming
Object-orientation

Numerical a[b] selects all the elements of x, for which the correspanding value (at
programming
NumPy package the same position) of y is True.
Array basics
Linear algebra
Indexing with boolean arrays
Data formats and
handling arr3
Pandas package
Series
DataFrame
## array([[4, 8, 5],
Import/Export data ## [9, 3, 4],
Visual illustrations
## [1, 0, 6]])
Matplotlib package
Figures and subplots y = arr3 % 2 == 0
Plot types and styles y
Pandas layers

Applications ## array([[ True, True, False],


Time series
Moving window
## [False, False, True],
Financial applications ## [False, True, True]])
Optimization

arr3[y]

## array([4, 8, 4, 0, 6])
© 2022 PyEcon.org
Conditional indexing 129
Essential concepts
Getting started
Procedural Conditional indexing allows you using boolean arrays to select subsets
programming
Object-orientation of values and to avoid loops. Applying comparison operator on arrays,
Numerical every element of the array is tested, if it corresponds to the logical
programming
NumPy package condition. Consider an application setting all even numbers to 5:
Array basics
Linear algebra
Find and replace values in arrays
Data formats and
handling a, b = arr3.copy(), arr3.copy()
Pandas package
for i in range(a.shape[0]):
for j in range(a.shape[1]):
Series
DataFrame
Import/Export data if a[i, j] % 2 == 0:
Visual illustrations a[i, j] = 5
Matplotlib package
Figures and subplots
b[b % 2 == 0] = 5
b
Plot types and styles
Pandas layers

## array([[5, 5, 5],
Applications
Time series
Moving window ## [9, 3, 5],
Financial applications ## [1, 5, 5]])
Optimization

np.allclose(a, b)

## True
© 2022 PyEcon.org
Conditional indexing 130
Essential concepts
Getting started
Procedural
programming Find and replace values in arrays, condition: equal
Object-orientation
arr3
Numerical
programming
NumPy package ## array([[4, 8, 5],
Array basics ## [9, 3, 4],
Linear algebra
## [1, 0, 6]])
Data formats and
handling
Pandas package
arr = arr3.copy()
Series arr[arr == 4] = 100
DataFrame arr
Import/Export data

Visual illustrations ## array([[100, 8, 5],


Matplotlib package
## [ 9, 3, 100],
Figures and subplots
Plot types and styles
## [ 1, 0, 6]])
Pandas layers

Applications
Time series In this example, arr == 4 creates a boolean array as described
before which is then used to index the array arr.
Moving window
Financial applications

Finally, every element of arr which is marked True according to


Optimization

the boolean index array will be set to 100.


© 2022 PyEcon.org
Best practice: Indexing arrays 131
Essential concepts
Getting started
Procedural Step 1a
Integer indexing array[row index, column index]: Indexing an n-
programming
Object-orientation

Numerical dimensional array with n integer indices returns the single value at this
programming
NumPy package position.
Array basics
Linear algebra Best practice Step 1a
Data formats and
handling mat = np.arange(12).reshape((3, 4))
Pandas package mat
Series
DataFrame
Import/Export data
## array([[ 0, 1, 2, 3],
## [ 4, 5, 6, 7],
Visual illustrations
Matplotlib package
## [ 8, 9, 10, 11]])
Figures and subplots
Plot types and styles mat[2, 2]
Pandas layers

Applications ## 10
Time series
Moving window
mat[0, -1]
Financial applications
Optimization
## 3

Keep in mind that, in this case only, the results are not arrays but
© 2022 PyEcon.org values!
Best practice: Indexing arrays 132
Essential concepts
Getting started
Procedural Step 1b
Integer indexing array[row index]: In n-dimensional arrays, the ele-
programming
Object-orientation

Numerical ment at each index is an (n − 1)-dimensional array.


programming
NumPy package
Array basics Best practice Step 1b
Linear algebra
mat = np.arange(12).reshape((3, 4))
Data formats and
handling mat
Pandas package
Series ## array([[ 0, 1, 2, 3],
DataFrame
Import/Export data
## [ 4, 5, 6, 7],
## [ 8, 9, 10, 11]])
Visual illustrations
Matplotlib package
Figures and subplots mat[2]
Plot types and styles
Pandas layers ## array([ 8, 9, 10, 11])
Applications
Time series mat[0]
Moving window
Financial applications
## array([0, 1, 2, 3])
Optimization

By specifying the row index only, we create arrays which are views.
© 2022 PyEcon.org
Best practice: Indexing arrays 133
Essential concepts
Getting started
Procedural Step 2a
Slicing array[start : stop : step]: Slicing can be used separately
programming
Object-orientation

Numerical for rows and columns.


programming
NumPy package
Array basics
Best practice Step 2a
Linear algebra
mat = np.arange(12).reshape((3, 4))
Data formats and
handling
mat
Pandas package
Series ## array([[ 0, 1, 2, 3],
DataFrame ## [ 4, 5, 6, 7],
Import/Export data
## [ 8, 9, 10, 11]])
Visual illustrations
Matplotlib package
mat[0:2]
Figures and subplots
Plot types and styles
Pandas layers ## array([[0, 1, 2, 3],
Applications
## [4, 5, 6, 7]])
Time series
Moving window mat[0:2, ::2]
Financial applications
Optimization
## array([[0, 2],
## [4, 6]])

© 2022 PyEcon.org
Best practice: Indexing arrays 134
Essential concepts
Getting started
Procedural Step 2b
programming
Object-orientation A frequent task is to get a specific row or column of an array. This can
Numerical be done easily by slicing.
programming
NumPy package
Array basics Best practice Step 2b
mat
Linear algebra

Data formats and


handling
Pandas package ## array([[ 0, 1, 2, 3],
Series ## [ 4, 5, 6, 7],
DataFrame
## [ 8, 9, 10, 11]])
Import/Export data

Visual illustrations row = mat[1] # get second row


Matplotlib package
Figures and subplots
column = mat[:, 2] # get third column
Plot types and styles row
Pandas layers

Applications ## array([4, 5, 6, 7])


Time series
Moving window column
Financial applications
Optimization
## array([ 2, 6, 10])

Slicing with [:] means to take every element from the first to the last.
© 2022 PyEcon.org
Best practice: Indexing arrays 135
Essential concepts
Getting started
Procedural Step 3
Fancy indexing array[rows list, columns list]: Return a one di-
programming
Object-orientation

Numerical mensional array with the values at the index tuples specified elementwise
programming
NumPy package by the index lists.
Array basics
Linear algebra
Best practice Step 3
Data formats and
handling mat = np.arange(12).reshape((3, 4))
Pandas package
mat
Series
DataFrame
Import/Export data ## array([[ 0, 1, 2, 3],
Visual illustrations
## [ 4, 5, 6, 7],
Matplotlib package ## [ 8, 9, 10, 11]])
Figures and subplots
Plot types and styles
mat[[1, 2], [1, 2]]
Pandas layers

Applications ## array([ 5, 10])


Time series

mat[[0, -1], [-1]]


Moving window
Financial applications
Optimization
## array([ 3, 11])

The index lists might also contain just a single element.


© 2022 PyEcon.org
Best practice: Indexing arrays 136
Essential concepts
Getting started
Procedural Step 4
programming
Object-orientation Conditional indexing: Applying comparison operators to arrays, the
Numerical boolean operations are evaluated elementwise in a vectorized fashion.
programming
NumPy package
Array basics Best practice Step 4
Linear algebra
bool_mat = mat > 0
Data formats and
handling bool_mat
Pandas package
Series ## array([[False, True, True, True],
DataFrame
## [ True, True, True, True],
Import/Export data
## [ True, True, True, True]])
Visual illustrations

mat[bool_mat] = 111
Matplotlib package
Figures and subplots
# equivalent to mat[mat > 0] = 111
Plot types and styles mat
Pandas layers

Applications ## array([[ 0, 111, 111, 111],


Time series ## [111, 111, 111, 111],
Moving window
## [111, 111, 111, 111]])
Financial applications
Optimization

© 2022 PyEcon.org
Best practice: Indexing arrays 137
Essential concepts
Getting started
Procedural Step 5
programming
Object-orientation Replacing values in arrays. Assigning a slice of an array to new values,
Numerical the shape of slice must be considered.
programming
NumPy package
Array basics Best practice Step 5
Linear algebra
mat[0] = np.array([3, 2, 1]) # Fails because the shapes do not fit
Data formats and
handling
## Error: could not broadcast array from shape (3) into shape (4)
Pandas package
Series
mat[2, 3] = 100
DataFrame
Import/Export data
mat[:, 0] = np.array([3, 3, 3])
Visual illustrations
mat
Matplotlib package
Figures and subplots ## array([[ 3, 111, 111, 111],
Plot types and styles ## [ 3, 111, 111, 111],
Pandas layers
## [ 3, 111, 111, 100]])
Applications
Time series
mat[1:3, 1:3] = np.array([[0, 0], [0, 0]])
mat
Moving window
Financial applications
Optimization
## array([[ 3, 111, 111, 111],
## [ 3, 0, 0, 111],
## [ 3, 0, 0, 100]])
© 2022 PyEcon.org
Reshaping arrays 138
Essential concepts

array.reshape((rows, columns)): Reshapes an existing array.


Getting started
Procedural

array.resize((rows, columns)): Changes array shape to rows x


programming
Object-orientation

Numerical columns and fills new values with 0.


programming
NumPy package
Array basics
Reshape
Linear algebra
arr = np.arange(15)
Data formats and
handling
arr.reshape((3, 5))
Pandas package
Series ## array([[ 0, 1, 2, 3, 4],
DataFrame ## [ 5, 6, 7, 8, 9],
Import/Export data
## [10, 11, 12, 13, 14]])
Visual illustrations
Matplotlib package
arr = np.arange(15)
Figures and subplots
Plot types and styles
arr.resize((3, 7))
Pandas layers arr
Applications
Time series ## array([[ 0, 1, 2, 3, 4, 5, 6],
Moving window ## [ 7, 8, 9, 10, 11, 12, 13],
Financial applications
Optimization
## [14, 0, 0, 0, 0, 0, 0]])

© 2022 PyEcon.org
Adding and removing elements of arrays 139
Essential concepts

np.append(array, value): Appends value to the end of array.


Getting started
Procedural

np.insert(array, index, value): Inserts values before index.


programming
Object-orientation

Numerical np.delete(array, index, axis): Deletes row or column on index.


programming
NumPy package
Array basics Naming
Linear algebra
a = np.arange(5)
Data formats and
handling a = np.append(a, 8)
Pandas package a = np.insert(a, 3, 77)
Series
print(a)
DataFrame
Import/Export data
## [ 0 1 2 77 3 4 8]
Visual illustrations
Matplotlib package
Figures and subplots
a.resize((3, 3))
Plot types and styles np.delete(a, 1, axis=0)
Pandas layers

Applications ## array([[0, 1, 2],


Time series ## [8, 0, 0]])
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Combining and splitting 140
Essential concepts

np.concatenate((arr1, arr2), axis): Joins a sequence of arrays


Getting started
Procedural
programming
Object-orientation along an existing axis.
Numerical np.split(array, n): Splits an array into multiple sub-arrays.
programming
NumPy package np.hsplit(array, n): Splits an array into multiple sub-arrays hori-
Array basics
Linear algebra
zontally.
Data formats and
handling Naming
Pandas package
Series np.concatenate((a, np.arange(6).reshape(2, 3)), axis=0)
DataFrame
Import/Export data ## array([[ 0, 1, 2],
Visual illustrations ## [77, 3, 4],
Matplotlib package ## [ 8, 0, 0],
Figures and subplots
Plot types and styles
## [ 0, 1, 2],
Pandas layers ## [ 3, 4, 5]])
Applications
Time series np.split(np.arange(8), 4)
Moving window
Financial applications ## [array([0, 1]), array([2, 3]), array([4, 5]), array([6, 7])]
Optimization

© 2022 PyEcon.org
Transposing array 141
Essential concepts

array.T: Returns the transposed array (as a view).


Getting started
Procedural
programming
Object-orientation

Numerical
Transpose
programming
NumPy package
arr3
Array basics
Linear algebra ## array([[4, 8, 5],
Data formats and ## [9, 3, 4],
handling ## [1, 0, 6]])
Pandas package

arr3.T
Series
DataFrame
Import/Export data

Visual illustrations
## array([[4, 9, 1],
Matplotlib package ## [8, 3, 0],
Figures and subplots ## [5, 4, 6]])
Plot types and styles
Pandas layers
np.eye(3).T
Applications
Time series
## array([[1., 0., 0.],
## [0., 1., 0.],
Moving window
Financial applications
Optimization ## [0., 0., 1.]])

© 2022 PyEcon.org
Matrix multiplication 142
Essential concepts

np.dot(arr1, arr2): Conducts a matrix multiplication of arr1 and


Getting started
Procedural

arr2. The @ operator can be used instead of the np.dot() function.


programming
Object-orientation

Numerical
programming Matrix multiplication
NumPy package
Array basics res = np.dot(arr3, np.arange(18).reshape((3, 6)))
Linear algebra
res
Data formats and
handling
Pandas package
## array([[108, 125, 142, 159, 176, 193],
Series ## [ 66, 82, 98, 114, 130, 146],
DataFrame ## [ 72, 79, 86, 93, 100, 107]])
Import/Export data

Visual illustrations res2 = arr3 @ np.arange(18).reshape((3, 6))


Matplotlib package
res2
Figures and subplots
Plot types and styles
Pandas layers ## array([[108, 125, 142, 159, 176, 193],
Applications
## [ 66, 82, 98, 114, 130, 146],
Time series ## [ 72, 79, 86, 93, 100, 107]])
Moving window
Financial applications np.allclose(res, res2)
Optimization

## True

© 2022 PyEcon.org
Array functions 143
Essential concepts
Getting started
Procedural
programming Element-wise functions
arr3
Object-orientation

Numerical
programming
NumPy package
## array([[4, 8, 5],
Array basics ## [9, 3, 4],
Linear algebra ## [1, 0, 6]])
Data formats and
handling
np.sqrt(arr3)
Pandas package
Series
DataFrame
## array([[2. , 2.82842712, 2.23606798],
Import/Export data ## [3. , 1.73205081, 2. ],
Visual illustrations ## [1. , 0. , 2.44948974]])
Matplotlib package
Figures and subplots np.exp(arr3)
Plot types and styles

## array([[5.45981500e+01, 2.98095799e+03, 1.48413159e+02],


Pandas layers

Applications
## [8.10308393e+03, 2.00855369e+01, 5.45981500e+01],
Time series
Moving window
## [2.71828183e+00, 1.00000000e+00, 4.03428793e+02]])
Financial applications
Optimization

© 2022 PyEcon.org
Overview: Element-wise array functions 144
Essential concepts
Getting started
Procedural
programming
Object-orientation
Function Description
Numerical abs Absolute value of integer and floating point
programming
NumPy package
sqrt Sqare root
Array basics
Linear algebra
exp Exponential function
Data formats and
log, log10, log2 Natural logarithm, log base 10, log base 2
handling
Pandas package
sign Sign (1 : positiv, 0: zero, -1 : negative)
Series ceil Rounding up to integer
DataFrame
Import/Export data floor Round down to integer
Visual illustrations rint Round to nearest integer
Matplotlib package
Figures and subplots
modf Returns fractional parts
Plot types and styles
Pandas layers
sin, cos, tan, sinh, cosh, tanh, arcsin, ...
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Binary functions 145
Essential concepts
Getting started
Procedural
programming Binary
Object-orientation

Numerical
x = np.array([3, -6, 8, 4, 3, 5])
programming y = np.array([3, 5, 7, 3, 5, 9])
NumPy package np.maximum(x, y)
Array basics

## array([3, 5, 8, 4, 5, 9])
Linear algebra

Data formats and


handling
Pandas package
np.greater_equal(x, y)
Series
DataFrame ## array([ True, False, True, True, False, False])
Import/Export data

Visual illustrations np.add(x, y)


Matplotlib package
Figures and subplots
## array([ 6, -1, 15, 7, 8, 14])
Plot types and styles
Pandas layers
np.mod(x, y)
Applications

## array([0, 4, 1, 1, 3, 5])
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Overview: Binary functions 146
Essential concepts
Getting started
Procedural
programming
Object-orientation
Function Description
Numerical add Add elements of arrays
programming
NumPy package
subtract Subtract elements in the second from the first array
Array basics
Linear algebra
multiply Multiply elements
Data formats and
divide Divide elements
handling
Pandas package
power Raise elements in first array to powers in second
Series maximum Element-wise maximum
DataFrame
Import/Export data minimum Element-wise minimum
Visual illustrations mod Element-wise modulus
Matplotlib package
Figures and subplots
greater, less, equal gives boolean
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Data processing 147
Essential concepts

np.meshgrid(array1, array2): Returns coordinate matrices from


Getting started
Procedural
programming
Object-orientation coordinate arrays.
Numerical p
programming
NumPy package
Evaluate the function f (x , y ) = x 2 + y 2 on a 10 x 10 grid
p = np.arange(-5, 5, 0.01)
Array basics
Linear algebra

Data formats and


x, y = np.meshgrid(p, p)
handling x
Pandas package
Series
## array([[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
DataFrame
Import/Export data
## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
Visual illustrations
Matplotlib package
## ...,
Figures and subplots ## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
Plot types and styles
## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
## [-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99]])
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Data processing 148
Essential concepts
Getting started
Procedural p
programming Evaluate the function f (x , y ) = x 2 + y 2 on a 10 x 10 grid.
Object-orientation

Numerical import matplotlib.pyplot as plt


programming val = np.sqrt(x**2 + y**2)
plt.figure(figsize=(2, 2))
NumPy package
Array basics
Linear algebra plt.imshow(val, cmap="hot")
Data formats and plt.colorbar()
handling
Pandas package ## <matplotlib.colorbar.Colorbar object at 0x16984cb80>
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Data processing 149
Essential concepts
Getting started
Procedural p
programming Evaluate the function f (x , y ) = x 2 + y 2 on a 10 x 10 grid.
Object-orientation

Numerical plt.show()
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package 6
Series
DataFrame

4
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers 2
Applications
Time series
Moving window
Financial applications 0
Optimization

© 2022 PyEcon.org
Conditional logic 150
Essential concepts

np.where(condition, a, b): If condition is True, returns value a,


Getting started
Procedural

otherwise returns b.
programming
Object-orientation

Numerical
programming Conditional logic
NumPy package
Array basics a = np.array([4, 7, 5, -7, 9, 0])
Linear algebra b = np.array([-1, 9, 8, 3, 3, 3])
Data formats and cond = np.array([True, True, False, True, False, False])
handling
Pandas package
res = np.where(cond, a, b)
Series res
DataFrame
Import/Export data ## array([ 4, 7, 8, -7, 3, 3])
Visual illustrations
Matplotlib package res = np.where(a <= b, b, a)
Figures and subplots
res
Plot types and styles
Pandas layers
## array([4, 9, 8, 3, 9, 3])
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Conditional logic 151
Essential concepts
Getting started
Procedural
programming Conditional logic, examples
Object-orientation
arr3
Numerical
programming
NumPy package ## array([[4, 8, 5],
Array basics ## [9, 3, 4],
Linear algebra
## [1, 0, 6]])
Data formats and
handling
Pandas package
res = np.where(arr3 < 5, 0, arr3)
Series res
DataFrame
Import/Export data ## array([[0, 8, 5],
Visual illustrations ## [9, 0, 0],
Matplotlib package
## [0, 0, 6]])
Figures and subplots
Plot types and styles
Pandas layers even = np.where(arr3 % 2 == 0, arr3, arr3 + 1)
Applications
even
Time series
Moving window ## array([[ 4, 8, 6],
Financial applications ## [10, 4, 4],
Optimization
## [ 2, 0, 6]])

© 2022 PyEcon.org
Statistical methods 152
Essential concepts

array.mean(): Computes the mean of all array elements.


Getting started
Procedural

array.sum(): Computes the sum of all array elements.


programming
Object-orientation

Numerical
programming Statistical methods
NumPy package
Array basics arr3
Linear algebra

Data formats and ## array([[4, 8, 5],


handling ## [9, 3, 4],
Pandas package
Series
## [1, 0, 6]])
DataFrame
Import/Export data arr3.mean()
Visual illustrations
Matplotlib package ## 4.444444444444445
Figures and subplots
Plot types and styles arr3.sum()
Pandas layers

Applications ## 40
Time series
Moving window
Financial applications
arr3.argmin()
Optimization
## 7

© 2022 PyEcon.org
Overview: Statistical methods 153
Essential concepts
Getting started
Procedural
programming
Object-orientation
Method Description
Numerical sum Sum of all array elements
programming
NumPy package
mean Mean of all array elements
Array basics
Linear algebra
std, var Standard deviation, variance
Data formats and
min, max Minimum and Maximum value in array
handling
Pandas package
argmin, argmax Indices of Minimum and Maximum value
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Axis 154
Essential concepts
Getting started
Procedural Axes are defined for arrays with more than one dimension. A two-
programming
Object-orientation dimensional array has two axes. The first one is running vertically
Numerical downwards across the rows (axis=0), the second one running horizon-
programming
NumPy package tally across the columns (axis=1).
Array basics
Linear algebra
Axis
Data formats and
handling arr3
Pandas package
Series ## array([[4, 8, 5],
DataFrame
Import/Export data
## [9, 3, 4],
## [1, 0, 6]])
Visual illustrations
Matplotlib package
Figures and subplots
arr3.sum(axis=0)
Plot types and styles
Pandas layers ## array([14, 11, 15])
Applications
Time series arr3.sum(axis=1)
Moving window
Financial applications
## array([17, 16, 7])
Optimization

© 2022 PyEcon.org
Sorting 155
Essential concepts

array.sort(axis): Sorts array by an axis.


Getting started
Procedural
programming
Object-orientation

Numerical
Sorting one-dimensional arrays
programming
arr2
NumPy package
Array basics
Linear algebra ## array([24.3 , 0. , 8.9 , 4.4 , 1.65, 45. ])
Data formats and
handling arr2.sort()
Pandas package arr2
Series

## array([ 0. , 1.65, 4.4 , 8.9 , 24.3 , 45. ])


DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Sorting 156
Essential concepts
Getting started
Procedural
programming
Sorting two-dimensional arrays
Object-orientation
arr3
Numerical
programming
NumPy package
## array([[4, 8, 5],
Array basics ## [9, 3, 4],
Linear algebra ## [1, 0, 6]])
Data formats and
handling
arr3.sort()
arr3
Pandas package
Series
DataFrame
Import/Export data ## array([[4, 5, 8],
Visual illustrations ## [3, 4, 9],
Matplotlib package ## [0, 1, 6]])
Figures and subplots
Plot types and styles
Pandas layers
arr3.sort(axis=0)
arr3
Applications
Time series
Moving window
## array([[0, 1, 6],
Financial applications ## [3, 4, 8],
Optimization ## [4, 5, 9]])

The default axis using sort() is -1, which means to sort along the
© 2022 PyEcon.org
last axis (in this case axis 1).
Section 2.3 157
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Numerical programming
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Linear algebra
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Inverse matrix 158
Essential concepts
Getting started
Procedural
programming Import numpy.linalg
Object-orientation
import numpy.linalg as nplin
Numerical
programming
NumPy package
Array basics
nplin.inv(array): Computes the inverse matrix.
Linear algebra np.allclose(array1, array2): Returns True if two arrays are ele-
Data formats and
handling
ment-wise equal within a tolerance.
Pandas package
Series Inverse
DataFrame
Import/Export data inv = nplin.inv(arr3)
Visual illustrations inv
Matplotlib package
Figures and subplots ## array([[ 4., -21., 16.],
Plot types and styles
## [ -5., 24., -18.],
Pandas layers
## [ 1., -4., 3.]])
Applications

np.allclose(np.identity(3), np.dot(inv, arr3))


Time series
Moving window
Financial applications
Optimization ## True

© 2022 PyEcon.org
Matrix functions 159
Essential concepts

nplin.det(array): Computes the determinant.


Getting started
Procedural

np.trace(array): Computes the trace.


programming
Object-orientation

Numerical np.diag(array): Returns the diagonal elements as an array.


programming
NumPy package
Array basics Linear algebra functions
Linear algebra
nplin.det(arr3)
Data formats and
handling
Pandas package ## -1.0
Series
DataFrame
np.trace(arr3)
Import/Export data

Visual illustrations ## 13
Matplotlib package

np.diag(arr3)
Figures and subplots
Plot types and styles
Pandas layers

Applications
## array([0, 4, 9])
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Eigenvalues and eigenvectors 160
Essential concepts

nplin.eig(array): Returns the array of eigenvalues and the array of


Getting started
Procedural
programming
Object-orientation eigenvectors as a list.
Numerical
programming Get eigenvalues and eigenvectors
NumPy package
Array basics A = np.array([[3, -1, 0], [2, 0, 0], [-2, 2, -1]])
Linear algebra
eigenval, eigenvec = nplin.eig(A)
Data formats and
handling
eigenval
Pandas package
Series ## array([-1., 1., 2.])
DataFrame
Import/Export data eigenvec
Visual illustrations
Matplotlib package ## array([[ 0.00000000e+00, -4.08248290e-01, -7.07106781e-01],
Figures and subplots
## [ 0.00000000e+00, -8.16496581e-01, -7.07106781e-01],
Plot types and styles
Pandas layers
## [ 1.00000000e+00, -4.08248290e-01, 1.17027782e-17]])
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Eigenvalues and eigenvectors 161
Essential concepts
Getting started
Procedural
programming Check eigenvalues and eigenvectors
Object-orientation
eigenval * eigenvec
Numerical
programming
NumPy package ## array([[-0.00000000e+00, -4.08248290e-01, -1.41421356e+00],
Array basics ## [-0.00000000e+00, -8.16496581e-01, -1.41421356e+00],
Linear algebra
## [-1.00000000e+00, -4.08248290e-01, 2.34055565e-17]])
Data formats and
handling
Pandas package
np.dot(A, eigenvec)
Series
DataFrame ## array([[ 0.00000000e+00, -4.08248290e-01, -1.41421356e+00],
Import/Export data ## [ 0.00000000e+00, -8.16496581e-01, -1.41421356e+00],
Visual illustrations ## [-1.00000000e+00, -4.08248290e-01, -1.17027782e-17]])
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
       
Applications 3 −1 0 0 0 0
Time series
Moving window
2 0 0  · 0 = (−1) · 0 =  0 
Financial applications
Optimization
−2 2 −1 1 1 −1

© 2022 PyEcon.org
QR decomposition 162
Essential concepts

nplin.qr(array): Conducts a QR decomposition and returns Q and


Getting started
Procedural
programming
Object-orientation R as lists.
Numerical
programming QR decomposition
NumPy package
Array basics Q, R = nplin.qr(arr3)
Linear algebra Q
Data formats and
handling ## array([[ 0. , 0.98058068, 0.19611614],
## [-0.6 , 0.15689291, -0.78446454],
Pandas package
Series
DataFrame ## [-0.8 , -0.11766968, 0.58834841]])
Import/Export data

Visual illustrations R
Matplotlib package
Figures and subplots ## array([[ -5. , -6.4 , -12. ],
Plot types and styles
## [ 0. , 1.0198039 , 6.07960019],
Pandas layers
## [ 0. , 0. , 0.19611614]])
Applications

np.allclose(arr3, np.dot(Q, R))


Time series
Moving window
Financial applications
Optimization ## True

© 2022 PyEcon.org
Linearsystem 163
Essential concepts

nplin.solve(A, b): Returns the solution of the linearsystem Ax = b.


Getting started
Procedural
programming
Object-orientation

Numerical
Solve linearsystems
programming
NumPy package
b = np.array([7, 4, 8])
Array basics x = nplin.solve(A, b)
Linear algebra x
Data formats and
handling ## array([ 2., -1., -14.])
Pandas package

np.allclose(np.dot(A, x), b)
Series
DataFrame
Import/Export data

Visual illustrations
## True
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers    
Applications
3x1 − 1x2 + 0x3 =7 x1 2
Time series 2x1 − 0x2 + 0x3 = 4 → x2  =  −1 
Moving window
Financial applications −2x1 + 2x2 − 1x3 =8 x3 −14
Optimization

© 2022 PyEcon.org
Overview: Linear algebra 164
Essential concepts
Getting started
Procedural
programming
Object-orientation
Function Description
Numerical np.dot Matrix multiplication
programming
NumPy package
np.trace Sum of the diagonal elements
Array basics
Linear algebra
np.diag Diagonal elements as an array
Data formats and
nplin.det Matrix determinant
handling
Pandas package
nplin.eig Eigenvalues and eigenvectors
Series nplin.inv Inverse matrix
DataFrame
Import/Export data nplin.qr QR decomposition
Visual illustrations nplin.solve Solve linearsystem
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Chapter 3 165
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Data formats and handling
programming
NumPy package
Array basics
Linear algebra
3.1 Pandas package
Data formats and
handling
3.2 Series
Pandas package
Series 3.3 DataFrame
DataFrame
Import/Export data 3.4 Import/Export data
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Section 3.1 166
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Data formats and handling
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Pandas package
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Pandas 167
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series The package pandas is a free software library for Python including the
DataFrame
Import/Export data following features:
Visual illustrations
Matplotlib package
Data manipulation and analysis,
Figures and subplots
Plot types and styles
DataFrame objects and Series,
Pandas layers
Export and import data from files and web,
Applications
Time series Handling of missing data.
Moving window
Financial applications → Provides high-performance data structures and data analysis tools.
Optimization

© 2022 PyEcon.org
Motivation 168
Essential concepts

With pandas you can import and visualize financial data in only a few
Getting started
Procedural
programming
Object-orientation lines of code.
Numerical
programming Motivation
NumPy package
Array basics
import pandas as pd
Linear algebra import matplotlib.pyplot as plt
Data formats and
handling fig = plt.figure()
Pandas package
ax = fig.add_subplot(1, 1, 1)
Series
DataFrame
dow = pd.read_csv("data/dji.csv", index_col=0, parse_dates=True)
Import/Export data close = dow["Close"]
Visual illustrations close.plot(ax=ax)
Matplotlib package ax.set_xlabel("Date")
Figures and subplots
ax.set_ylabel("Price")
Plot types and styles
Pandas layers
ax.set_title("DJI")
Applications
fig.savefig("out/dji.pdf", format="pdf")
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Motivation 169
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical DJI
programming 27500
NumPy package
Array basics 25000
Linear algebra

Data formats and 22500


handling
Pandas package
Series 20000
DataFrame
Import/Export data 17500
Price

Visual illustrations
Matplotlib package 15000
Figures and subplots
Plot types and styles
Pandas layers
12500
Applications
Time series
10000
Moving window
Financial applications 7500
Optimization

6 8 0 2 4 6 8
200 200 201 201 201 201 201
Date

© 2022 PyEcon.org
Section 3.2 170
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Data formats and handling
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Series
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Series 171
Essential concepts
Getting started
Procedural Series are a data structure in pandas.
programming
Object-orientation

Numerical
One-dimensional array-like object,
programming
NumPy package
Containing a sequence of values and a corresponding array of
Array basics labels, called the index,
Linear algebra

Data formats and The string representation of a Series displays the index on the left
handling
Pandas package
and the values on the right,
Series
DataFrame
The default index consists of the integers 0 through N-1.
Import/Export data

Visual illustrations
Matplotlib package String representation of a Series
Figures and subplots
Plot types and styles
## 0 3
Pandas layers ## 1 7
Applications ## 2 -8
Time series ## 3 4
Moving window
## 4 26
Financial applications
Optimization
## dtype: int64

© 2022 PyEcon.org
Create Series 172
Essential concepts

pd.Series(): Creates one-dimensional array-like object including val-


Getting started
Procedural
programming
Object-orientation ues and an index.
Numerical
programming Importing Pandas and creating a Series
NumPy package
Array basics import numpy as np
Linear algebra import pandas as pd
Data formats and
handling
obj = pd.Series([2, -5, 9, 4])
Pandas package
Series
obj
DataFrame
Import/Export data ## 0 2
Visual illustrations ## 1 -5
Matplotlib package ## 2 9
Figures and subplots
## 3 4
Plot types and styles
Pandas layers ## dtype: int64
Applications
Time series
Moving window
Financial applications
Simple Series formed only from a list,
Optimization
An index is added automatically.

© 2022 PyEcon.org
Create Series 173
Essential concepts
Getting started
Procedural
programming Series indexing vs. Numpy indexing
Object-orientation

Numerical
obj2 = pd.Series([2, -5, 9, 4], index=["a", "b", "c", "d"])
programming npobj = np.array([2, -5, 9, 4])
NumPy package obj2
Array basics

## a 2
Linear algebra

Data formats and


handling
## b -5
Pandas package ## c 9
Series ## d 4
DataFrame
Import/Export data
## dtype: int64
Visual illustrations
obj2["b"]
Matplotlib package
Figures and subplots
Plot types and styles ## -5
Pandas layers

Applications npobj[1]
Time series
Moving window ## -5
Financial applications
Optimization

NumPy arrays can only be indexed by integers while Series can be


indexed by the manually set index.
© 2022 PyEcon.org
Create Series 174
Essential concepts
Getting started
Procedural Pandas Series can be created from:
programming
Object-orientation Lists,
Numerical
programming NumPy arrays,
NumPy package
Array basics Dicts.
Linear algebra

Data formats and


handling
Series creation from Numpy arrays
Pandas package
Series
npobj = np.array([2, -5, 9, 4])
DataFrame obj2 = pd.Series(npobj, index=["a", "b", "c", "d"])
Import/Export data obj2
Visual illustrations
Matplotlib package ## a 2
Figures and subplots
## b -5
## c 9
Plot types and styles
Pandas layers
## d 4
Applications
Time series
## dtype: int64
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Create Series 175
Essential concepts
Getting started
Procedural
programming Series from dicts
dictdata = {"Göttingen": 117665, "Northeim": 28920,
Object-orientation

Numerical
programming "Hannover": 532163, "Berlin": 3574830}
NumPy package obj3 = pd.Series(dictdata)
Array basics obj3
Linear algebra

Data formats and ## Göttingen 117665


handling
Pandas package
## Northeim 28920
Series ## Hannover 532163
DataFrame ## Berlin 3574830
Import/Export data
## dtype: int64
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles The index of the Series can be set manually,
Pandas layers

Applications Compared to NumPy array you can use the set index to select
Time series
single values,
Moving window
Financial applications
Optimization
Data contained in a dict can be passed to a Series. The index of
the resulting Series consists of the dict’s keys.

© 2022 PyEcon.org
Create Series 176
Essential concepts
Getting started
Procedural
programming Dict to Series with manual index
cities = ["Hamburg", "Göttingen", "Berlin", "Hannover"]
Object-orientation

Numerical
programming obj4 = pd.Series(dictdata, index=cities)
NumPy package obj4
Array basics
Linear algebra
## Hamburg NaN
Data formats and ## Göttingen 117665.0
handling
Pandas package
## Berlin 3574830.0
Series ## Hannover 532163.0
DataFrame ## dtype: float64
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots Passing a dict to a Series, the index can be set manually,
NaN (not a number) marks missing values where the index and the
Plot types and styles
Pandas layers

Applications dict do not match.


Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Series properties 177
Essential concepts

Series.values: Returns the values of a Series.


Getting started
Procedural

Series.index: Returns the index of a Series.


programming
Object-orientation

Numerical
programming Series properties
NumPy package
Array basics obj.values
Linear algebra

Data formats and ## array([ 2, -5, 9, 4])


handling
Pandas package obj.index
Series

## RangeIndex(start=0, stop=4, step=1)


DataFrame
Import/Export data

Visual illustrations
Matplotlib package
obj2.index
Figures and subplots
Plot types and styles ## Index(['a', 'b', 'c', 'd'], dtype='object')
Pandas layers

Applications
Time series
Moving window
The values and the index of a Series can be printed separately.
Financial applications
Optimization
The default index, if none was explicitly specified, is a RangeIndex.
RangeIndex inherits from Index class.

© 2022 PyEcon.org
Selecting and manipulating values 178
Essential concepts
Getting started
Procedural
programming Series manipulation
Object-orientation

Numerical
obj2[["c", "d", "a"]]
programming
NumPy package ## c 9
Array basics
## d 4
## a 2
Linear algebra

Data formats and


handling
## dtype: int64
Pandas package
Series obj2[obj2 < 0]
DataFrame
Import/Export data
## b -5
Visual illustrations ## dtype: int64
Matplotlib package
Figures and subplots
Plot types and styles NumPy-like functions can be applied on Series
Pandas layers

Applications For filtering data,


Time series
Moving window To do scalar multiplications or applying math functions,
Financial applications
Optimization The index-value link will be preserved.

© 2022 PyEcon.org
Selecting and manipulating values 179
Essential concepts
Getting started
Procedural
programming Series functions
Object-orientation
obj2 * 2
Numerical
programming
NumPy package
## a 4
Array basics ## b -10
Linear algebra ## c 18
Data formats and ## d 8
handling
## dtype: int64
Pandas package
Series
DataFrame np.exp(obj2)["a":"c"]
Import/Export data

Visual illustrations ## a 7.389056


Matplotlib package ## b 0.006738
Figures and subplots
## c 8103.083928
## dtype: float64
Plot types and styles
Pandas layers

Applications
Time series
"c" in obj2
Moving window
Financial applications ## True
Optimization

Mathematical functions applied to a Series will only be applied on


its values – not on its index.
© 2022 PyEcon.org
Selecting and manipulating values 180
Essential concepts
Getting started
Procedural
programming Series manipulation
Object-orientation
obj4["Hamburg"] = 1900000
Numerical
programming obj4
NumPy package
Array basics ## Hamburg 1900000.0
Linear algebra
## Göttingen 117665.0
Data formats and
handling
## Berlin 3574830.0
Pandas package ## Hannover 532163.0
Series ## dtype: float64
DataFrame
Import/Export data
obj4[["Berlin", "Hannover"]] = [3600000, 1100000]
Visual illustrations obj4
Matplotlib package

## Hamburg 1900000.0
Figures and subplots
Plot types and styles
Pandas layers ## Göttingen 117665.0
Applications ## Berlin 3600000.0
Time series ## Hannover 1100000.0
Moving window
## dtype: float64
Financial applications
Optimization

Values can be manipulated by using the labels in the index,


Sets of values can be set in one line.
© 2022 PyEcon.org
Detect missing data 181
Essential concepts

pd.isnull(): True if data is missing.


Getting started
Procedural

pd.notnull(): False if data is missing.


programming
Object-orientation

Numerical
programming NaN
NumPy package
Array basics pd.isnull(obj4)
Linear algebra

Data formats and ## Hamburg False


handling
## Göttingen False
## Berlin False
Pandas package
Series
DataFrame ## Hannover False
Import/Export data ## dtype: bool
Visual illustrations
Matplotlib package pd.notnull(obj4)
Figures and subplots
Plot types and styles
Pandas layers
## Hamburg True
## Göttingen True
Applications
Time series
## Berlin True
Moving window ## Hannover True
Financial applications ## dtype: bool
Optimization

© 2022 PyEcon.org
Align differently indexed data 182
Essential concepts

There are not two values to align for Hamburg and Northeim – so they
Getting started
Procedural

are marked with NaN (not a number).


programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Data 1 Data 2
Linear algebra obj3 obj4
Data formats and
handling ## Göttingen 117665 ## Hamburg 1900000.0
Pandas package
Series
## Northeim 28920 ## Göttingen 117665.0
DataFrame ## Hannover 532163 ## Berlin 3600000.0
Import/Export data ## Berlin 3574830 ## Hannover 1100000.0
Visual illustrations ## dtype: int64 ## dtype: float64
Matplotlib package
Figures and subplots
Plot types and styles Align data
Pandas layers

Applications
obj3 + obj4
Time series
Moving window ## Berlin 7174830.0
Financial applications ## Göttingen 235330.0
Optimization
## Hamburg NaN
## Hannover 1632163.0
## Northeim NaN
## dtype: float64
© 2022 PyEcon.org
Naming Series 183
Essential concepts

Series.name: Returns name of the Series.


Getting started
Procedural

Series.index.name: Returns name of the Series’ index.


programming
Object-orientation

Numerical
programming Naming
NumPy package
Array basics obj4.name = "population"
Linear algebra obj4.index.name = "city"
Data formats and obj4
handling
Pandas package
Series
## city
DataFrame ## Hamburg 1900000.0
Import/Export data ## Göttingen 117665.0
Visual illustrations ## Berlin 3600000.0
Matplotlib package ## Hannover 1100000.0
Figures and subplots
Plot types and styles
## Name: population, dtype: float64
Pandas layers

Applications
Time series The attribute name will change the name of the existing Series,
Moving window
Financial applications There is no default name of the Series or the index.
Optimization

© 2022 PyEcon.org
Series vs. NumPy arrays 184
Essential concepts
Getting started
Procedural
programming NumPy arrays are accessed by their integer positions,
Object-orientation

Numerical Series can be accessed by a user defined index, including letters


programming
NumPy package
and numbers,
Array basics
Linear algebra
Different Series can be aligned efficiently by the index,
Data formats and
handling
Series can work with missing values, so operations do not auto-
Pandas package matically fail.
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Section 3.3 185
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Data formats and handling
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I DataFrame
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
DataFrame 186
Essential concepts
Getting started
Procedural
programming DataFrames are the primary structure of pandas,
Object-orientation

Numerical It represents a table of data with an ordered collection of columns,


programming
NumPy package Each column can have a different data type,
Array basics
Linear algebra A DataFrame can be thought of as a dict of Series sharing the
Data formats and
handling
same index,
Pandas package
Series
Physically a DataFrame is two-dimensional but by using hierarchical
DataFrame indexing it can respresent higher dimensional data.
Import/Export data

Visual illustrations
Matplotlib package String representation of a DataFrame
Figures and subplots
Plot types and styles ## company price volume
Pandas layers
## 0 Daimler 69.20 4456290
Applications ## 1 E.ON 8.11 3667975
Time series
Moving window
## 2 Siemens 110.92 3669487
Financial applications ## 3 BASF 87.28 1778058
Optimization ## 4 BMW 87.81 1824582

© 2022 PyEcon.org
DataFrame 187
Essential concepts

pd.DataFrame(): Creates a DataFrame which is a two-dimensional


Getting started
Procedural
programming
Object-orientation tabular-like structure with labeled axis (rows and columns).
Numerical
programming Creating a DataFrame
NumPy package
Array basics data = {"company": ["Daimler", "E.ON", "Siemens", "BASF", "BMW"],
Linear algebra "price": [69.2, 8.11, 110.92, 87.28, 87.81],
Data formats and "volume": [4456290, 3667975, 3669487, 1778058, 1824582]}
handling
Pandas package
frame = pd.DataFrame(data)
Series frame
DataFrame
Import/Export data ## company price volume
Visual illustrations ## 0 Daimler 69.20 4456290
Matplotlib package
## 1 E.ON 8.11 3667975
## 2 Siemens 110.92 3669487
Figures and subplots
Plot types and styles
Pandas layers ## 3 BASF 87.28 1778058
Applications ## 4 BMW 87.81 1824582
Time series
Moving window
Financial applications In this example the construction of the DataFrame frame is done
Optimization
by passing a dict of equal-length lists,
Instead of passing a dict of lists, it is also possible to pass a dict
of NumPy arrays.
© 2022 PyEcon.org
Show DataFrames 188
Essential concepts
Getting started
Procedural
programming Print DataFrame
frame2 = pd.DataFrame(data, columns=["company", "volume",
Object-orientation

Numerical
programming "price", "change"])
NumPy package frame2
Array basics
Linear algebra
## company volume price change
Data formats and ## 0 Daimler 4456290 69.20 NaN
handling
Pandas package
## 1 E.ON 3667975 8.11 NaN
Series ## 2 Siemens 3669487 110.92 NaN
DataFrame ## 3 BASF 1778058 87.28 NaN
Import/Export data
## 4 BMW 1824582 87.81 NaN
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles Passing a column that is not contained in the dict, it will be
Pandas layers
marked with NaN,
Applications
Time series The default index will be assigned automatically as with Series.
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Inputs to DataFrame constructor 189
Essential concepts
Getting started
Procedural
programming
Object-orientation
Type Description
Numerical 2D NumPy arrays A matrix of data
programming
NumPy package
dict of arrays, lists, or tuples Each sequence becomes a column
Array basics
Linear algebra
dict of Series Each value becomes a column
Data formats and
dict of dicts Each inner dict becomes a column
handling
Pandas package
List of dicts or Series Each item becomes a row
Series List of lists or tuples Treated as the 2D NumPy arrays
DataFrame
Import/Export data Another DataFrame Same indexes
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Indexing and adding DataFrames 190
Essential concepts
Getting started
Procedural
programming Add data to DataFrame
frame2["change"] = [1.2, -3.2, 0.4, -0.12, 2.4]
Object-orientation

Numerical
programming frame2["change"]
NumPy package
Array basics ## 0 1.20
Linear algebra
## 1 -3.20
Data formats and ## 2 0.40
handling
Pandas package
## 3 -0.12
Series ## 4 2.40
DataFrame ## Name: change, dtype: float64
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots Selecting the column of DataFrame, a Series is returned,
A attribute-like access, e.g., frame2.change, is also possible,
Plot types and styles
Pandas layers

Applications
Time series
The returned Series has the same index as the initial DataFrame.
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Indexing DataFrames 191
Essential concepts
Getting started
Procedural
programming Indexing DataFrames
Object-orientation

Numerical
frame2[["company", "change"]]
programming
NumPy package ## company change
Array basics
## 0 Daimler 1.20
## 1 E.ON -3.20
Linear algebra

Data formats and


handling
## 2 Siemens 0.40
Pandas package ## 3 BASF -0.12
Series ## 4 BMW 2.40
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Using a list of multiple columns while indexing, the result is a
Figures and subplots DataFrame,
Plot types and styles
Pandas layers The returned DataFrame has the same index as the initial one.
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Changing DataFrames 192
Essential concepts

del DataFrame[column]: Deletes column from DataFrame.


Getting started
Procedural
programming
Object-orientation

Numerical
DataFrame delete column
programming
del frame2["volume"]
frame2
NumPy package
Array basics
Linear algebra

Data formats and


## company price change
handling ## 0 Daimler 69.20 1.20
Pandas package
## 1 E.ON 8.11 -3.20
## 2 Siemens 110.92 0.40
Series
DataFrame
Import/Export data ## 3 BASF 87.28 -0.12
Visual illustrations ## 4 BMW 87.81 2.40
Matplotlib package
Figures and subplots frame2.columns
Plot types and styles
Pandas layers
## Index(['company', 'price', 'change'], dtype='object')
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Naming DataFrames 193
Essential concepts
Getting started
Procedural
programming Naming properties
Object-orientation
frame2.index.name = "number:"
Numerical
programming frame2.columns.name = "feature:"
NumPy package frame2
Array basics
Linear algebra
## feature: company price change
Data formats and
handling
## number:
Pandas package ## 0 Daimler 69.20 1.20
Series ## 1 E.ON 8.11 -3.20
DataFrame
## 2 Siemens 110.92 0.40
Import/Export data
## 3 BASF 87.28 -0.12
## 4 BMW 87.81 2.40
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
In DataFrames there is no default name for the index or the
Applications
Time series
columns.
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Reindexing 194
Essential concepts

DataFrame.reindex(): Creates new DataFrame with data conformed


Getting started
Procedural
programming
Object-orientation to a new index, while the initial DataFrame will not be changed.
Numerical
programming Reindexing
NumPy package
Array basics frame3 = frame.reindex([0, 2, 3, 4])
Linear algebra
frame3
Data formats and
handling
Pandas package
## company price volume
Series ## 0 Daimler 69.20 4456290
DataFrame ## 2 Siemens 110.92 3669487
Import/Export data
## 3 BASF 87.28 1778058
Visual illustrations ## 4 BMW 87.81 1824582
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers Index values that are not already present will be filled with NaN by
Applications default,
Time series
Moving window
There are many options for filling missing values.
Financial applications
Optimization

© 2022 PyEcon.org
Reindexing 195
Essential concepts
Getting started
Procedural
programming Filling missing values
Object-orientation

Numerical
frame4 = frame.reindex(index=[0, 2, 3, 4, 5], fill_value=0,
programming columns=["company", "price", "market cap"])
NumPy package frame4
Array basics

## company price market cap


Linear algebra

Data formats and


handling
## 0 Daimler 69.20 0
Pandas package ## 2 Siemens 110.92 0
Series ## 3 BASF 87.28 0
DataFrame
Import/Export data
## 4 BMW 87.81 0
## 5 0 0.00 0
Visual illustrations
Matplotlib package
Figures and subplots
frame4 = frame.reindex(index=[0, 2, 3, 4], fill_value=np.nan,
Plot types and styles columns=["company", "price", "market cap"])
Pandas layers frame4
Applications
Time series ## company price market cap
Moving window
## 0 Daimler 69.20 NaN
Financial applications
Optimization ## 2 Siemens 110.92 NaN
## 3 BASF 87.28 NaN
## 4 BMW 87.81 NaN

© 2022 PyEcon.org
Fill NaN 196
Essential concepts

DataFrame.fillna(value): Fills NaNs with value.


Getting started
Procedural
programming
Object-orientation
Filling NaN
Numerical
programming
frame4[:3]
NumPy package
Array basics
Linear algebra ## company price market cap
Data formats and
## 0 Daimler 69.20 NaN
handling ## 2 Siemens 110.92 NaN
Pandas package
## 3 BASF 87.28 NaN
Series
DataFrame
Import/Export data
frame4.fillna(1000000, inplace=True)
Visual illustrations
frame4[:3]
Matplotlib package
Figures and subplots ## company price market cap
Plot types and styles ## 0 Daimler 69.20 1000000.0
Pandas layers
## 2 Siemens 110.92 1000000.0
Applications
## 3 BASF 87.28 1000000.0
Time series
Moving window
Financial applications
Optimization The option inplace=True fills the current DafaFrame (here
frame4). Without using inplace a new DataFrame will be cre-
ated, filled with NaN values.
© 2022 PyEcon.org
Dropping entries 197
Essential concepts

DataFrame.drop(index, axis): Returns a new object with labels in


Getting started
Procedural
programming
Object-orientation requested axis removed.
Numerical
programming Dropping index
NumPy package
Array basics frame5 = frame
Linear algebra
frame5
Data formats and
handling
## company price volume
Pandas package
Series
## 0 Daimler 69.20 4456290
DataFrame ## 1 E.ON 8.11 3667975
Import/Export data
## 2 Siemens 110.92 3669487
Visual illustrations ## 3 BASF 87.28 1778058
Matplotlib package
## 4 BMW 87.81 1824582
Figures and subplots
Plot types and styles
Pandas layers frame5.drop([1, 2])
Applications
Time series
## company price volume
Moving window ## 0 Daimler 69.20 4456290
Financial applications
## 3 BASF 87.28 1778058
## 4 BMW 87.81 1824582
Optimization

© 2022 PyEcon.org
Dropping entries 198
Essential concepts
Getting started
Procedural
programming Dropping column
Object-orientation

Numerical
frame5[:2]
programming
NumPy package ## company price volume
Array basics
## 0 Daimler 69.20 4456290
## 1 E.ON 8.11 3667975
Linear algebra

Data formats and


handling
Pandas package
frame5.drop("price", axis=1)[:3]
Series
DataFrame ## company volume
Import/Export data
## 0 Daimler 4456290
Visual illustrations ## 1 E.ON 3667975
Matplotlib package
Figures and subplots
## 2 Siemens 3669487
Plot types and styles
Pandas layers frame5.drop(2, axis=0)
Applications
Time series ## company price volume
Moving window ## 0 Daimler 69.20 4456290
Financial applications
## 1 E.ON 8.11 3667975
## 3 BASF 87.28 1778058
Optimization

## 4 BMW 87.81 1824582

© 2022 PyEcon.org
Indexing, selecting and filtering 199
Essential concepts

Indexing of DataFrames works like indexing an numpy array, you can


Getting started
Procedural
programming
Object-orientation use the default index values and a manually set index.
Numerical
programming Indexing
NumPy package
Array basics frame
Linear algebra

Data formats and ## company price volume


handling
## 0 Daimler 69.20 4456290
Pandas package
Series
## 1 E.ON 8.11 3667975
DataFrame ## 2 Siemens 110.92 3669487
Import/Export data
## 3 BASF 87.28 1778058
Visual illustrations ## 4 BMW 87.81 1824582
Matplotlib package
Figures and subplots
Plot types and styles
frame[2:]
Pandas layers

Applications
## company price volume
Time series
## 2 Siemens 110.92 3669487
Moving window ## 3 BASF 87.28 1778058
Financial applications
## 4 BMW 87.81 1824582
Optimization

© 2022 PyEcon.org
Indexing, selecting and filtering 200
Essential concepts
Getting started
Procedural
programming Indexing
Object-orientation

Numerical
frame6 = pd.DataFrame(data, index=["a", "b", "c", "d", "e"])
programming frame6
NumPy package
Array basics
## company price volume
## a Daimler 69.20 4456290
Linear algebra

Data formats and


handling
## b E.ON 8.11 3667975
Pandas package ## c Siemens 110.92 3669487
Series ## d BASF 87.28 1778058
DataFrame
Import/Export data
## e BMW 87.81 1824582
Visual illustrations
frame6["b":"d"]
Matplotlib package
Figures and subplots
Plot types and styles ## company price volume
Pandas layers ## b E.ON 8.11 3667975
Applications ## c Siemens 110.92 3669487
Time series ## d BASF 87.28 1778058
Moving window
Financial applications
Optimization
When slicing with labels the end element is inclusive.

© 2022 PyEcon.org
Indexing, selecting and filtering 201
Essential concepts

DataFrame.loc(): Selects a subset of rows and columns from a


Getting started
Procedural
programming
Object-orientation DataFrame using axis labels.
Numerical DataFrame.iloc(): Selects a subset of rows and columns from a
programming
NumPy package DataFrame using integers.
Array basics
Linear algebra
Selection with loc and iloc
Data formats and
handling frame6.loc["c", ["company", "price"]]
Pandas package
Series ## company Siemens
DataFrame
Import/Export data
## price 110.92
## Name: c, dtype: object
Visual illustrations
Matplotlib package
Figures and subplots frame6.iloc[2, [0, 1]]
Plot types and styles
Pandas layers ## company Siemens
Applications ## price 110.92
Time series ## Name: c, dtype: object
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Indexing, selecting and filtering 202
Essential concepts
Getting started
Procedural
programming Selection with loc and iloc
frame6.loc[["c", "d", "e"], ["volume", "price", "company"]]
Object-orientation

Numerical
programming
NumPy package ## volume price company
Array basics ## c 3669487 110.92 Siemens
Linear algebra
## d 1778058 87.28 BASF
Data formats and ## e 1824582 87.81 BMW
handling
Pandas package
Series
frame6.iloc[2:, ::-1]
DataFrame
Import/Export data ## volume price company
Visual illustrations ## c 3669487 110.92 Siemens
Matplotlib package ## d 1778058 87.28 BASF
Figures and subplots
Plot types and styles
## e 1824582 87.81 BMW
Pandas layers

Applications
Time series Both of the indexing functions work with slices or lists of labels,
Moving window
Financial applications Many ways to select and rearrange pandas objects.
Optimization

© 2022 PyEcon.org
DataFrame indexing options 203
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical Type Description


programming
NumPy package
df[val] Select single column or set of columns
Array basics
df.loc[val] Select single row or set of rows
Linear algebra

Data formats and


df.loc[:, val] Select single column or set of columns
handling
Pandas package
df.loc[val1, val2] Select row and column by label
Series df.iloc[where] Select row or set of rows by integer position
DataFrame
Import/Export data df.iloc[:, where] Select column or set of columns by integer pos.
Visual illustrations df.iloc[w1, w2] Select row and column by integer position
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Hierarchical indexing 204
Essential concepts
Getting started
Procedural Hierarchical indexing enables you to have multiple index levels.
programming
Object-orientation
Multiindex
Numerical
programming ind = [["a", "a", "a", "b", "b"], [1, 2, 3, 1, 2]]
NumPy package frame6 = pd.DataFrame(np.arange(15).reshape((5, 3)), index=ind,
columns=["first", "second", "third"])
Array basics
Linear algebra
frame6
Data formats and
handling
Pandas package ## first second third
Series ## a 1 0 1 2
DataFrame
Import/Export data
## 2 3 4 5
## 3 6 7 8
Visual illustrations
Matplotlib package
## b 1 9 10 11
Figures and subplots ## 2 12 13 14
Plot types and styles
Pandas layers
frame6.index.names = ["index1", "index2"]
Applications frame6.index
Time series
Moving window
Financial applications
## MultiIndex([('a', 1),
Optimization ## ('a', 2),
## ('a', 3),
## ('b', 1),
## ('b', 2)],
© 2022 PyEcon.org ## names=['index1', 'index2'])
Hierarchical indexing 205
Essential concepts
Getting started
Procedural
programming Selecting of a multiindex
Object-orientation

Numerical
frame6.loc["a"]
programming
NumPy package ## first second third
Array basics
## index2
## 1 0 1 2
Linear algebra

Data formats and


handling
## 2 3 4 5
Pandas package ## 3 6 7 8
Series
DataFrame frame6.loc["b", 1]
Import/Export data

Visual illustrations ## first 9


Matplotlib package
Figures and subplots
## second 10
Plot types and styles ## third 11
Pandas layers ## Name: (b, 1), dtype: int64
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Operations between DataFrame and Series 206
Essential concepts
Getting started
Procedural
programming Series and DataFrames
frame7 = frame[["price", "volume"]]
Object-orientation

Numerical
programming frame7.index = ["Daimler", "E.ON", "Siemens", "BASF", "BMW"]
NumPy package series = frame7.iloc[2]
Array basics frame7
Linear algebra

Data formats and ## price volume


handling
Pandas package
## Daimler 69.20 4456290
Series ## E.ON 8.11 3667975
DataFrame ## Siemens 110.92 3669487
Import/Export data
## BASF 87.28 1778058
Visual illustrations
## BMW 87.81 1824582
Matplotlib package
Figures and subplots
Plot types and styles
series
Pandas layers

Applications
## price 110.92
Time series ## volume 3669487.00
Moving window ## Name: Siemens, dtype: float64
Financial applications
Optimization

Here the Series was generated from the first row of the DataFrame.

© 2022 PyEcon.org
Operations between DataFrames and Series 207
Essential concepts
Getting started
Procedural
programming Operations between Series and DataFrames down the rows
Object-orientation
frame7 + series
Numerical
programming
NumPy package ## price volume
Array basics ## Daimler 180.12 8125777.0
Linear algebra
## E.ON 119.03 7337462.0
Data formats and
handling
## Siemens 221.84 7338974.0
Pandas package ## BASF 198.20 5447545.0
Series ## BMW 198.73 5494069.0
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
By default arithmetic operations between DataFrames and Series
Figures and subplots match the index of the Series on the DataFrame’s columns,
Plot types and styles
Pandas layers The operations will be broadcasted along the rows.
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Operations between DataFrames and Series 208
Essential concepts
Getting started
Procedural
programming Operations between Series and DataFrames down the columns
Object-orientation

Numerical
series2 = frame7["price"]
programming frame7.add(series2, axis=0)
NumPy package
Array basics
## price volume
## Daimler 138.40 4456359.20
Linear algebra

Data formats and


handling
## E.ON 16.22 3667983.11
Pandas package ## Siemens 221.84 3669597.92
Series ## BASF 174.56 1778145.28
DataFrame
Import/Export data
## BMW 175.62 1824669.81
Visual illustrations

Here, the Series was generated from the price column,


Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers The arithmetic operation will be broadcasted along a column
Applications matching the DataFrame’s row index (axis=0).
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Operations between DataFrames and Series 209
Essential concepts
Getting started
Procedural
programming Pandas vs Numpy
Object-orientation

Numerical
nparr = np.arange(12.).reshape((3, 4))
programming row = nparr[0]
NumPy package nparr - row
Array basics

## array([[0., 0., 0., 0.],


Linear algebra

Data formats and


handling
## [4., 4., 4., 4.],
Pandas package ## [8., 8., 8., 8.]])
Series
DataFrame
Import/Export data

Visual illustrations
Operations between DataFrames are similar to operations between
Matplotlib package one- and two-dimensional Numpy arrays,
Figures and subplots
Plot types and styles As in DataFrames and Series the arithmetic operations will be
Pandas layers
broadcasted along the rows.
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
NumPy functions on DataFrames 210
Essential concepts

DataFrame.apply(np.function, axis): Applies a NumPy function


Getting started
Procedural
programming
Object-orientation on the DataFrame axis. See also statistical and mathematical NumPy
Numerical functions.
programming
NumPy package
Array basics
Numpy functions on DataFrames
Linear algebra
frame7[:2]
Data formats and
handling
Pandas package
## price volume
Series ## Daimler 69.20 4456290
DataFrame ## E.ON 8.11 3667975
Import/Export data

Visual illustrations frame7.apply(np.mean)


Matplotlib package
Figures and subplots
Plot types and styles
## price 72.664
Pandas layers ## volume 3079278.400
Applications
## dtype: float64
Time series
Moving window frame7.apply(np.sqrt)[:2]
Financial applications
Optimization
## price volume
## Daimler 8.318654 2110.992657
## E.ON 2.847806 1915.195812

© 2022 PyEcon.org
Grouping DataFrames 211
Essential concepts

DataFrame.groupby(col1, col2): Groups DataFrame by columns


Getting started
Procedural
programming
Object-orientation (grouping by one or more than two columns is also possible). See also
Numerical how to import data from CSV files.
programming
NumPy package
Array basics Groupby
Linear algebra
vote = pd.read_csv("data/vote.csv")[["Party", "Member", "Vote"]]
Data formats and
handling vote.head()
Pandas package
Series ## Party Member Vote
DataFrame
Import/Export data
## 0 CDU/CSU Abercron yes
## 1 CDU/CSU Albani yes
Visual illustrations
Matplotlib package
## 2 CDU/CSU Altenkamp yes
Figures and subplots ## 3 CDU/CSU Altmaier absent
Plot types and styles ## 4 CDU/CSU Amthor yes
Pandas layers

Adding the functions count() or mean() to groupby() returns the


Applications
Time series
Moving window
Financial applications
sum or the mean of the grouped columns.
Optimization

© 2022 PyEcon.org
Grouping DataFrames 212
Essential concepts
Getting started
Procedural
programming Groupby
Object-orientation

Numerical
res = vote.groupby(["Party", "Vote"]).count()
programming res
NumPy package
Array basics
## Member
## Party Vote
Linear algebra

Data formats and


handling
## AfD absent 6
Pandas package ## no 86
Series ## BÜ90/GR absent 9
DataFrame
Import/Export data
## no 58
## CDU/CSU absent 7
Visual illustrations
Matplotlib package
## yes 239
Figures and subplots ## DIE LINKE. absent 7
Plot types and styles ## no 62
Pandas layers
## FDP absent 5
Applications ## no 75
## Fraktionslos absent 1
Time series
Moving window
Financial applications ## no 1
Optimization ## SPD absent 6
## yes 147

© 2022 PyEcon.org
Section 3.4 213
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Data formats and handling
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Import/Export data
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Reading data in text format 214
Essential concepts
Getting started
Procedural ex1.csv
programming
Object-orientation

Numerical
programming
a, b, c, d, hello
NumPy package 1, 2, 3, 4, world
Array basics
Linear algebra
5, 6, 7, 8, python
Data formats and 2, 3, 5, 7, pandas
handling
Pandas package
Series
DataFrame
pd.read_csv("file"): Reads CSV into DataFrame.
Import/Export data

Visual illustrations Read comma-separated values


Matplotlib package
Figures and subplots df = pd.read_csv("data/ex1.csv")
Plot types and styles df
Pandas layers

Applications ## a b c d hello
Time series ## 0 1 2 3 4 world
## 1 5 6 7 8 python
Moving window
Financial applications
Optimization ## 2 2 3 5 7 pandas

© 2022 PyEcon.org
Reading data in text format 215
Essential concepts
Getting started
Procedural tab.txt
programming
Object-orientation

Numerical
programming
a| b| c| d| hello
NumPy package 1| 2| 3| 4| world
5| 6| 7| 8| python
Array basics
Linear algebra

Data formats and 2| 3| 5| 7| pandas


handling
Pandas package

pd.read_table("file", sep): Reads table with any seperators into


Series
DataFrame
Import/Export data
DataFrame.
Visual illustrations
Matplotlib package Read table values
Figures and subplots
Plot types and styles df = pd.read_table("data/tab.txt", sep="|")
Pandas layers df
Applications
Time series ## a b c d hello
Moving window
Financial applications
## 0 1 2 3 4 world
Optimization ## 1 5 6 7 8 python
## 2 2 3 5 7 pandas

© 2022 PyEcon.org
Reading data in text format 216
Essential concepts
Getting started
Procedural ex2.csv
programming
Object-orientation

Numerical
programming
1, 2, 3, 4, world
NumPy package 5, 6, 7, 8, python
Array basics
Linear algebra
2, 3, 5, 7, pandas
Data formats and
handling
Pandas package CSV file without header row:
Series
DataFrame
Read CSV and header settings
Import/Export data

Visual illustrations df = pd.read_csv("data/ex2.csv", header=None)


Matplotlib package df
Figures and subplots
Plot types and styles
Pandas layers
## 0 1 2 3 4
## 0 1 2 3 4 world
Applications
Time series
## 1 5 6 7 8 python
Moving window ## 2 2 3 5 7 pandas
Financial applications
Optimization

© 2022 PyEcon.org
Reading data in text format 217
Essential concepts
Getting started
Procedural ex2.csv
programming
Object-orientation

Numerical
programming
1, 2, 3, 4, world
NumPy package 5, 6, 7, 8, python
Array basics
Linear algebra
2, 3, 5, 7, pandas
Data formats and
handling
Pandas package Specify header:
Series
DataFrame
Import/Export data
Read CSV and header names
Visual illustrations df = pd.read_csv("data/ex2.csv",
Matplotlib package names=["a", "b", "c", "d", "hello"])
Figures and subplots
df
Plot types and styles
Pandas layers
## a b c d hello
Applications
Time series
## 0 1 2 3 4 world
Moving window ## 1 5 6 7 8 python
Financial applications ## 2 2 3 5 7 pandas
Optimization

© 2022 PyEcon.org
Reading data in text format 218
Essential concepts
Getting started
Procedural ex2.csv
programming
Object-orientation

Numerical
programming
1, 2, 3, 4, world
NumPy package 5, 6, 7, 8, python
Array basics
Linear algebra
2, 3, 5, 7, pandas
Data formats and
handling
Pandas package Use hello-column as the index:
Series
DataFrame
Read CSV and specify index
Import/Export data

Visual illustrations df = pd.read_csv("data/ex2.csv",


Matplotlib package names=["a", "b", "c", "d", "hello"],
Figures and subplots
index_col="hello")
Plot types and styles
Pandas layers
df
Applications
Time series
## a b c d
Moving window ## hello
Financial applications ## world 1 2 3 4
Optimization
## python 5 6 7 8
## pandas 2 3 5 7

© 2022 PyEcon.org
Reading data in text format 219
Essential concepts
Getting started
Procedural ex3.csv
programming
Object-orientation

Numerical
programming
1, 2, 3, 4, world
NumPy package #+#-.,.-'*'-.,
Array basics
Linear algebra
5, 6, 7, 8, python
Data formats and 87646756754456978
handling
Pandas package
2, 3, 5, 7, pandas
Series
DataFrame
Import/Export data Skip rows while reading:
Visual illustrations
Matplotlib package
Figures and subplots
Read CSV and choose rows
Plot types and styles df = pd.read_csv("data/ex3.csv", skiprows=[1, 3])
Pandas layers
df
Applications
Time series
## 1 2 3 4 world
## 0 5 6 7 8 python
Moving window
Financial applications
Optimization ## 1 2 3 5 7 pandas

© 2022 PyEcon.org
Writing data to text file 220
Essential concepts

DataFrame.to_csv("filename"): Writes DataFrame to CSV.


Getting started
Procedural
programming
Object-orientation

Numerical
Write to CSV
programming df = pd.read_csv("data/ex3.csv", skiprows=[1, 3])
NumPy package
Array basics
df.to_csv("out/out1.csv")
Linear algebra

Data formats and out1.csv


handling
Pandas package

,1, 2, 3, 4, world
Series
DataFrame
Import/Export data
0,5,6,7,8, python
Visual illustrations
Matplotlib package
1,2,3,5,7, pandas
Figures and subplots
Plot types and styles
Pandas layers In the .csv file, the index and header is included (reason why ,1).
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Writing data to text file 221
Essential concepts
Getting started
Procedural
programming Write to CSV and settings
Object-orientation

Numerical
df = pd.read_csv("data/ex3.csv", skiprows=[1, 3])
programming df.to_csv("out/out2.csv", index=False, header=False)
NumPy package
Array basics
Linear algebra out2.csv
Data formats and
handling
Pandas package 5,6,7,8, python
2,3,5,7, pandas
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Writing data to text file 222
Essential concepts
Getting started
Procedural
programming Write to CSV and specify header
Object-orientation

Numerical
df = pd.read_csv("data/ex3.csv", skiprows=[1, 3, 4])
programming df.to_csv("out/out3.csv", index=False,
NumPy package header=["a", "b", "c", "d", "e"])
Array basics
Linear algebra

Data formats and out3.csv


handling
Pandas package
Series
DataFrame
a,b,c,d,e
Import/Export data 5,6,7,8, python
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Reading Excel files 223
Essential concepts

pd.read_excel("file.xls"): Reads .xls files.


Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data Figure: goog.xls
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers Reading Excel
Applications
xls_frame = pd.read_excel("data/goog.xls")
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Reading Excel files 224
Essential concepts
Getting started
Procedural
programming Excel as a DataFrame
xls_frame[["Adj Close", "Volume", "High"]]
Object-orientation

Numerical
programming
NumPy package ## Adj Close Volume High
Array basics ## 0 1169.939941 1538700 1173.000000
Linear algebra
## 1 1167.699951 2412100 1174.000000
Data formats and ## 2 1111.900024 4857900 1123.069946
handling
Pandas package
## 3 1055.800049 3798300 1110.000000
Series ## 4 1080.599976 3448000 1081.709961
DataFrame ## 5 1048.579956 2341700 1081.780029
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Remote data access 225
Essential concepts
Getting started
Procedural Extract financial data from Internet sources into a DataFrame. There
programming
Object-orientation are different sources offering different kind of data. Some sources are:
Numerical
programming
Robinhood
NumPy package
Array basics
IEX
Linear algebra
Yahoo Finance
Data formats and
handling
Pandas package
World Bank
Series
DataFrame
OECD
Import/Export data
Eurostat
Visual illustrations
Matplotlib package A complete list of the sources and the usage can be found here:
Figures and subplots
pandas-datareader
Plot types and styles
Pandas layers

Applications
Import pandas-datareader
Time series
Moving window
from pandas_datareader import data
Financial applications
Optimization

© 2022 PyEcon.org
Data access: Yahoo Finance 226
Essential concepts

data.DataReader("symbol", "source", "start", "end"): Returns


Getting started
Procedural
programming
Object-orientation financial data of a stock in a certain time period.
Numerical
programming Get data of Ford
NumPy package
Array basics ford = data.DataReader("F", "yahoo", "2020-01-01", "2020-01-31")
Linear algebra ford.head()[["Close", "Volume"]]
Data formats and
handling ## Close Volume
## Date
Pandas package
Series
DataFrame ## 2020-01-02 9.42 43425700.0
Import/Export data ## 2020-01-03 9.21 45040800.0
Visual illustrations ## 2020-01-06 9.16 43372300.0
Matplotlib package ## 2020-01-07 9.25 44984100.0
## 2020-01-08 9.25 45994900.0
Figures and subplots
Plot types and styles
Pandas layers

Applications Stock code list

Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Data access: Yahoo Finance 227
Essential concepts
Getting started
Procedural
programming Explore Ford dataset
Object-orientation
ford.index
Numerical
programming ## DatetimeIndex(['2020-01-02', '2020-01-03',...
NumPy package
Array basics
## ...dtype='datetime64[ns]', name='Date',...
Linear algebra
ford.loc["2020-01-28"]
Data formats and
handling
Pandas package
## High 9.000000e+00
Series ## Low 8.860000e+00
DataFrame ## Open 8.940000e+00
Import/Export data
## Close 8.970000e+00
Visual illustrations ## Volume 8.516340e+07
Matplotlib package
Figures and subplots
## Adj Close 8.730923e+00
Plot types and styles ## Name: 2020-01-28 00:00:00, dtype: float64
Pandas layers

Applications
Time series DataFrame index
Moving window
Financial applications Index of the DataFrame is different at different sources. Always check
DataFrame.index!
Optimization

© 2022 PyEcon.org
Data access: Yahoo Finance 228
Essential concepts
Getting started
Procedural
programming Download and explore SAP data
Object-orientation

Numerical
sap = data.DataReader("SAP", "yahoo", "2020-01-01", "2020-06-30")
programming sap[25:27]
NumPy package
Array basics
## High Low ... Volume Adj Close
## Date ...
Linear algebra

Data formats and


handling
## 2020-02-07 136.020004 134.639999 ... 511700.0 130.987106
Pandas package ## 2020-02-10 135.369995 134.679993 ... 381200.0 131.151978
Series ##
DataFrame
Import/Export data
## [2 rows x 6 columns]
Visual illustrations
sap.loc["2020-03-09"]
Matplotlib package
Figures and subplots
Plot types and styles ## High 1.161900e+02
Pandas layers ## Low 1.105500e+02
Applications ## Open 1.136100e+02
Time series ## Close 1.115000e+02
Moving window
## Volume 1.571800e+06
Financial applications
Optimization ## Adj Close 1.081376e+02
## Name: 2020-03-09 00:00:00, dtype: float64

© 2022 PyEcon.org
Data access: Eurostat 229
Essential concepts
Getting started
Procedural
programming Eurostat
population = data.DataReader("tps00001", "eurostat", "2010-01-01",
Object-orientation

Numerical
programming "2020-01-01")
NumPy package
population.columns
Array basics
Linear algebra
## MultiIndex(levels=[[Population on 1 January - total], [Albania,
Data formats and ## Andorra, Armenia, Austria, Azerbaijan, Belarus, Belgium, ...
handling
Pandas package population["Population on 1 January - total", "France"][-5:]
Series
DataFrame ## FREQ Annual
Import/Export data
## TIME_PERIOD
Visual illustrations ## 2016-01-01 66638391.0
Matplotlib package
Figures and subplots
## 2017-01-01 66809816.0
Plot types and styles ## 2018-01-01 66918941.0
Pandas layers ## 2019-01-01 67012883.0
Applications ## 2020-01-01 67098824.0
Time series
Moving window
Eurostat Database
Financial applications
Optimization

© 2022 PyEcon.org
Read data from HTML 230
Essential concepts
Getting started
Procedural Website used for the example: Econometrics
programming
Object-orientation

Numerical
Beautiful Soup
programming
from bs4 import BeautifulSoup
NumPy package
Array basics
import requests
Linear algebra url = "www.uni-goettingen.de/de/applied-econometrics/412565.html"
Data formats and r = requests.get("https://" + url)
handling d = r.text
soup = BeautifulSoup(d, "lxml")
Pandas package
Series
DataFrame
soup.title
Import/Export data

Visual illustrations ## <title>Applied Econometrics - Georg-August-... ...</title>


Matplotlib package
Figures and subplots
Plot types and styles Reading data from HTML in detail exceeds the content of this course.
Pandas layers
If you are interested in this kind of importing data, you can find detailed
Applications
Time series information on Beautiful Soup here.
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Motivation 231
Essential concepts
Getting started
Procedural
programming Bollinger
Object-orientation

Numerical
sap = data.DataReader("SAP", "yahoo", "2019-01-01", "2020-08-31")
programming sap.index = pd.to_datetime(sap.index)
NumPy package boll = sap["Close"].rolling(window=20, center=False).mean()
Array basics
Linear algebra
std = sap["Close"].rolling(window=20, center=False).std()
upp = boll + std * 2
Data formats and
handling low = boll - std * 2
Pandas package fig = plt.figure()
Series
ax = fig.add_subplot(1, 1, 1)
DataFrame
Import/Export data
boll.plot(ax=ax, label="20 days Rolling mean")
Visual illustrations
upp.plot(ax=ax, label="Upper Band")
Matplotlib package low.plot(ax=ax, label="Lower Band")
Figures and subplots sap["Close"].plot(ax=ax, label="SAP Price")
Plot types and styles
ax.legend(loc="best")
Pandas layers
fig.savefig("out/boll.pdf")
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Motivation 232
Essential concepts
Getting started
Procedural
programming
Object-orientation
20 days Rolling mean
Upper Band
Numerical Lower Band
programming 160 SAP Price
NumPy package
Array basics
Linear algebra

Data formats and 140


handling
Pandas package
Series
DataFrame
Import/Export data 120
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
100
Pandas layers

Applications
Time series
Moving window
1 3 5 7 9 1 1 3 5 7 9
9-0 019-0 019-0 019-0 019-0 019-1 020-0 020-0 020-0 020-0 020-0
201
Financial applications
Optimization 2 2 2 2 2 2 2 2 2 2
Date

© 2022 PyEcon.org
Chapter 4 233
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Visual illustrations
programming
NumPy package
Array basics
Linear algebra
4.1 Matplotlib package
Data formats and
handling
4.2 Figures and subplots
Pandas package
Series 4.3 Plot types and styles
DataFrame
Import/Export data 4.4 Pandas layers
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Section 4.1 234
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Visual illustrations
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Matplotlib package
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
matplotlib 235
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and The package matplotlib is a free software library for python including
handling
Pandas package
the following functions:
Series
DataFrame
Image plots, Contour plots, Scatter plots, Polar plots, Line plots,
Import/Export data 3D plots,
Visual illustrations
Matplotlib package Variety of hardcopy formats,
Figures and subplots
Plot types and styles Works in Python scripts, the Python and IPython shell and the
Pandas layers
Jupyter notebook,
Applications
Time series Interactive environments.
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
matplotlib 236
Essential concepts
Getting started
Procedural
programming
Usage of matplotlib
Object-orientation

Numerical
matplotlib has a vast number of functions and options, which is hard
programming to remember. But for almost every task there is an example you can
NumPy package
Array basics take code from. A great source of information is the examples gallery
Linear algebra
on the matplotlib homepage. Also note the best practice quick start
Data formats and
handling guide.
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Simple plot 237
Essential concepts

plt.plot(array): Plots the values of a list, the X-axis has by default


Getting started
Procedural

the range [0, 1, ..., n-1].


programming
Object-orientation

Numerical
programming Import matplotlib and simple example
NumPy package
Array basics import matplotlib.pyplot as plt
Linear algebra
import numpy as np
Data formats and plt.plot(np.arange(10))
handling
Pandas package
plt.savefig("out/list.pdf")
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
8
Figures and subplots
Plot types and styles 6

Pandas layers
4
Applications
Time series
2
Moving window
Financial applications 0

Optimization 0 2 4 6 8

© 2022 PyEcon.org
Section 4.2 238
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Visual illustrations
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Figures and subplots
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Figures 239
Essential concepts

Plots in matplotlib reside in a Figure object:


Getting started
Procedural

plt.figure(...): Creates new Figure object allowing for multiple


programming
Object-orientation

Numerical parameters.
programming
NumPy package plt.gcf(): Returns the reference of the active figure.
Array basics
Linear algebra
Create Figures
Data formats and
handling fig = plt.figure(figsize=(16, 8))
Pandas package
print(plt.gcf())
Series
DataFrame
Import/Export data ## Figure(1600x800)
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
A Figure object can be considered as an empty window,
Pandas layers
The Figure object has a number of options, such as the size or
Applications
Time series
the aspect ratio,
Moving window
Financial applications
You cannot draw a plot in a blank figure. There has to be a
Optimization
subplot in the Figure object.

© 2022 PyEcon.org
Saving plots to file 240
Essential concepts

plt.savefig("filename"): Saves active figure to file.


Getting started
Procedural
programming
Object-orientation Available file formats are among others:
Numerical
programming
NumPy package Filename extension Description
Array basics
Linear algebra
.png Portable Network Graphics
Data formats and .pdf Portable Document Format
handling
Pandas package
.svg Scalable Vector Graphics
Series .jpeg JPEG File Interchange Format
DataFrame
Import/Export data .jpg JPEG File Interchange Format
Visual illustrations .ps PostScript
Matplotlib package
Figures and subplots .raw Raw Image Format
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Subplots 241
Essential concepts

fig.add_subplot(): Adds subplot to the Figure fig.


Getting started
Procedural

Example: fig.add_subplot(2, 2, 1) creates four subplots and se-


programming
Object-orientation

Numerical lects the first.


programming
NumPy package
Array basics
Adding subplots
Linear algebra
ax1 = fig.add_subplot(2, 2, 1)
Data formats and
handling
ax2 = fig.add_subplot(2, 2, 2)
Pandas package ax3 = fig.add_subplot(2, 2, 3)
Series ax4 = fig.add_subplot(2, 2, 4)
DataFrame
fig.savefig("out/subplots.pdf")
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots The Figure object is filled with subplots in which the plots reside,
Plot types and styles
Pandas layers Using the plt.plot() command without creating a subplot in
Applications advance, matplotlib will create a Figure object and a subplot
Time series
Moving window automatically,
Financial applications
Optimization The Figure object and its subplots can be created in one line.

© 2022 PyEcon.org
Subplots 242
Essential concepts
Getting started
Procedural
programming
Object-orientation
1.0 1.0
Numerical
programming 0.8 0.8
NumPy package
0.6 0.6
Array basics
Linear algebra 0.4 0.4

0.2 0.2
Data formats and
handling 0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Pandas package
1.0 1.0
Series
DataFrame 0.8 0.8
Import/Export data
0.6 0.6
Visual illustrations 0.4 0.4
Matplotlib package
0.2 0.2
Figures and subplots
Plot types and styles 0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Subplots 243
Essential concepts
Getting started
Procedural
programming Filling subplots with content
Object-orientation
from numpy.random import randn
Numerical
programming ax1.plot([5, 7, 4, 3, 1])
NumPy package ax2.hist(randn(100), bins=20, color="r")
Array basics
ax3.scatter(np.arange(30), np.arange(30) * randn(30))
Linear algebra
ax4.plot(randn(40), "k--")
fig.savefig("out/content.pdf")
Data formats and
handling
Pandas package
Series
DataFrame
Import/Export data The subplots in one Figure object can be filled with different plot
Visual illustrations types,
Matplotlib package
Figures and subplots Using only plt.plot() matplotlib draws the plot in the last
Plot types and styles
Pandas layers
Figure object and last subplot selected.
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Subplots 244
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical 7 14
programming 6 12

NumPy package 5 10
8
Array basics 4
6
Linear algebra 3
4
2
Data formats and 2
handling 1
0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 2 1 0 1 2
Pandas package
Series
2
40
DataFrame
Import/Export data 20 1

0 0
Visual illustrations
Matplotlib package 20
1
Figures and subplots 40
2
Plot types and styles
0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 40
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Standard creation of plots 245
Essential concepts

plt.subplots(nrows, ncols, sharex, sharey): Creates figure and


Getting started
Procedural

subplots in one line. If sharex or sharey are True, all subplots share
programming
Object-orientation

Numerical the same X- or Y-ticks.


programming
NumPy package
Array basics
Standard creation
Linear algebra fig, axes = plt.subplots(2, 3, figsize=(16, 8), sharey=True)
Data formats and axes[1, 1].plot(np.arange(7), color="r")
handling
axes[0, 2].plot(np.arange(10, 0, -1))
Pandas package
Series
fig.savefig("out/standard.pdf")
DataFrame
Import/Export data

Visual illustrations
Matplotlib package 10

Figures and subplots 8

Plot types and styles 6

Pandas layers 4

2
Applications 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8
Time series
10
Moving window
8
Financial applications
6
Optimization 4

0
0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0

© 2022 PyEcon.org
Section 4.3 246
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Visual illustrations
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Plot types and styles
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Plot types 247
Essential concepts

ax.scatter(x, y): Creates a scatter plot of x vs y.


Getting started
Procedural

ax.hist(x, bins): Creates a histogram.


programming
Object-orientation

Numerical ax.fill_between(x, y, a): Creates a plot of x vs y and fills plot


programming
NumPy package between a and y.
Array basics
Linear algebra
Types
Data formats and
handling fig, ax = plt.subplots(1, 3, figsize=(16, 8))
Pandas package
ax[0].hist([1, 2, 3, 4, 5, 4, 3, 2, 3, 4, 2, 3, 4, 4],
Series
DataFrame bins=5, color="yellow")
Import/Export data x = np.arange(0, 10, 0.1)
Visual illustrations y = np.sin(x)
Matplotlib package ax[1].fill_between(x, y, 0, color="green")
Figures and subplots
ax[2].scatter(x, y)
Plot types and styles
Pandas layers fig.savefig("out/types.pdf")
Applications
Time series
Moving window
A vast number of plot types can be found in the examples gallery.
Financial applications
Optimization

© 2022 PyEcon.org
Plot types 248
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical 5 1.00 1.00


programming
NumPy package 0.75 0.75
Array basics 4
Linear algebra 0.50 0.50

Data formats and 0.25 0.25


handling 3

Pandas package 0.00 0.00


Series
2 0.25 0.25
DataFrame
Import/Export data
0.50 0.50
Visual illustrations 1
0.75 0.75
Matplotlib package
Figures and subplots
1.00 1.00
Plot types and styles 0
1 2 3 4 5 0 2 4 6 8 10 0 2 4 6 8 10
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Adjusting the spacing around subplots 249
Essential concepts

plt.subplots_adjust(left, bottom, ..., hspace): Sets the space


Getting started
Procedural

between the subplots. wspace and hspace control the percentage of


programming
Object-orientation

Numerical the figure width and figure height, respectively, to use as spacing be-
programming
NumPy package tween subplots.
Array basics
Linear algebra
Adjust spacing
Data formats and
handling fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)
Pandas package
for i in range(2):
Series
DataFrame
for j in range(2):
Import/Export data axes[i][j].plot(randn(10))
Visual illustrations plt.subplots_adjust(wspace=0, hspace=0)
Matplotlib package fig.savefig("out/spacing.pdf")
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Adjusting the spacing around subplots 250
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming 3
NumPy package
Array basics 2
Linear algebra

Data formats and


1
handling
Pandas package 0
Series
DataFrame 1
Import/Export data

Visual illustrations 3
Matplotlib package
Figures and subplots 2
Plot types and styles
Pandas layers 1
Applications
0
Time series
Moving window
Financial applications
1
Optimization

0 2 4 6 8 0 2 4 6 8

© 2022 PyEcon.org
Colors, markers and line styles 251
Essential concepts

ax.plot(data, linestyle, color, marker): Sets data and styles


Getting started
Procedural

of subplot ax.
programming
Object-orientation

Numerical
programming Styles
NumPy package
Array basics fig, ax = plt.subplots(1, figsize=(15, 6))
Linear algebra
ax.plot(randn(10), linestyle="--", color="darkcyan", marker="p")
Data formats and
handling
fig.savefig("out/style.pdf")
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations 2.0

Matplotlib package
1.5
Figures and subplots
Plot types and styles 1.0
Pandas layers
0.5
Applications
Time series 0.0

Moving window
0.5
Financial applications
Optimization 1.0
0 2 4 6 8

© 2022 PyEcon.org
Plot colors 252
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Plot line styles 253
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Plot markers 254
Essential concepts
Getting started
Procedural
programming
Marker Description
Object-orientation "." point
Numerical
programming
"," pixel
NumPy package "o" circle
Array basics
Linear algebra "v" triangle_down
Data formats and "8" octagon
handling
Pandas package "s" square
Series
DataFrame
"p" pentagon
Import/Export data
"P" plus (filled)
Visual illustrations
Matplotlib package
"*" star
Figures and subplots "h" hexagon1
Plot types and styles
Pandas layers "H" hexagon2
Applications "+" plus
Time series
Moving window "x" x
Financial applications
Optimization
"X" x (filled)
"D" diamond

© 2022 PyEcon.org
Ticks and labels 255
Essential concepts

ax.set_xticks(): Sets list of X-ticks, analogously for Y-axis.


Getting started
Procedural

ax.set_xlabel(): Sets the X-label.


programming
Object-orientation

Numerical ax.set_title(): Sets the subplot title.


programming
NumPy package
Array basics Ticks and labels - default
Linear algebra
fig, ax = plt.subplots(1, figsize=(15, 10))
Data formats and
handling
ax.plot(randn(1000).cumsum())
Pandas package fig.savefig("out/withoutlabels.pdf")
Series
DataFrame
Import/Export data

Visual illustrations
Here, we create a Figure object as well as a subplot and fill it
Matplotlib package with a line plot of a random walk,
Figures and subplots
Plot types and styles By default matplotlib places the ticks evenly distributed along the
Pandas layers
data range. Individual ticks can be set as follows,
Applications
Time series
By default there is no axis label or title.
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Ticks and labels 256
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming
NumPy package
60
Array basics
Linear algebra

Data formats and


handling
Pandas package 40
Series
DataFrame
Import/Export data

Visual illustrations 20
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
0
Applications
Time series
Moving window
0 200 400 600 800 1000
Financial applications
Optimization

© 2022 PyEcon.org
Ticks and labels 257
Essential concepts
Getting started
Procedural
programming Set ticks and labels
ax.set_xticks([0, 250, 500, 750, 1000])
Object-orientation

Numerical
programming ax.set_xlabel("Days", fontsize=20)
NumPy package ax.set_ylabel("Change", fontsize=20)
Array basics ax.set_title("Simulation", fontsize=30)
Linear algebra
fig.savefig("out/labels.pdf")
Data formats and
handling
Pandas package
Series
DataFrame
The individual ticks are given as a list to ax.set_xticks(),
Import/Export data
The label and title can be set to an individual size using the
Visual illustrations
Matplotlib package
argument fontsize.
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Ticks and labels 258
Essential concepts
Getting started
Procedural

Simulation
programming
Object-orientation

Numerical
programming
NumPy package
60
Array basics
Linear algebra

Data formats and


handling
Pandas package 40
Change

Series
DataFrame
Import/Export data

Visual illustrations 20
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
0
Applications
Time series
Moving window
0 250 500 750 1000
Financial applications
Optimization
Days

© 2022 PyEcon.org
Legends 259
Essential concepts
Getting started
Procedural Using multiple plots in one subplot one needs a legend.
ax.legend(loc): Shows the legend at location loc.
programming
Object-orientation

Numerical Some options: "best", "upper right", "center left", ...


programming
NumPy package
Array basics Set legend
Linear algebra
fig = plt.figure(figsize=(15, 10))
Data formats and
handling ax = fig.add_subplot(1, 1, 1)
Pandas package ax.plot(randn(1000).cumsum(), label="first")
Series
ax.plot(randn(1000).cumsum(), label="second")
DataFrame
Import/Export data
ax.plot(randn(1000).cumsum(), label="third")
Visual illustrations
ax.legend(loc="best", fontsize=20)
Matplotlib package fig.savefig("out/legend.pdf")
Figures and subplots
Plot types and styles
Pandas layers

Applications
The legend displays the label and the color of the associated plot,
Time series
Moving window
Using the option "best" the legend will placed in a corner where
Financial applications is does not interfere the plots.
Optimization

© 2022 PyEcon.org
Legends 260
Essential concepts
Getting started
Procedural
programming
Object-orientation
80
Numerical first
programming
second
NumPy package
Array basics
60 third
Linear algebra

Data formats and 40


handling
Pandas package
20
Series
DataFrame
Import/Export data
0
Visual illustrations
Matplotlib package
Figures and subplots 20
Plot types and styles
Pandas layers
40
Applications
Time series
Moving window 60
0 200 400 600 800 1000
Financial applications
Optimization

© 2022 PyEcon.org
Annotations on a subplot 261
Essential concepts

ax.text(x, y, "text", fontsize): Inserts a text into a subplot.


Getting started
Procedural

ax.annotate("text", xy, xytext, arrwoprops): Inserts an ar-


programming
Object-orientation

Numerical row with annotations.


programming
NumPy package
Array basics
Annotations
Linear algebra ax.text(400, -30, "here", fontsize=50)
Data formats and ax.annotate("there",
handling
Pandas package
fontsize=40,
Series xy=(0, 0),
DataFrame xytext=(400, 8),
Import/Export data
arrowprops=dict(facecolor="black",
Visual illustrations shrink=0.05))
Matplotlib package
Figures and subplots
ax.set_yticks([-40, -30, -20, -10, 0, 10, 20, 30, 40])
Plot types and styles fig.savefig("out/arrow.pdf")
Pandas layers

Applications
Time series
Moving window
Using ax.annotate() the arrow head points at xy and the
Financial applications bottom left corner of the text will be placed at xytext.
Optimization

© 2022 PyEcon.org
Annotations 262
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical first
programming
second
NumPy package
Array basics
third
Linear algebra

Data formats and 40


handling 30
Pandas package

there
20
Series
DataFrame 10
Import/Export data
0
Visual illustrations
10

here
Matplotlib package
Figures and subplots 20
Plot types and styles 30
Pandas layers
40
Applications
Time series
Moving window
0 200 400 600 800 1000
Financial applications
Optimization

© 2022 PyEcon.org
Annotations 263
Essential concepts
Getting started
Procedural
programming Annotation Lehman
import pandas as pd
Object-orientation

Numerical
programming
from datetime import datetime
NumPy package
Array basics date = datetime(2008, 9, 15)
Linear algebra
fig = plt.figure(figsize=(16, 8))
Data formats and
handling
ax = fig.add_subplot(1, 1, 1)
Pandas package
dow = pd.read_csv("data/dji.csv", index_col=0, parse_dates=True)
Series close = dow["Close"]
DataFrame
close.plot(ax=ax)
ax.annotate("Lehman Bankruptcy",
Import/Export data

Visual illustrations
fontsize=30,
Matplotlib package
Figures and subplots
xy=(date, close.loc[date] + 400),
Plot types and styles xytext=(date, 22000),
Pandas layers
arrowprops=dict(facecolor="red",
Applications shrink=0.03))
Time series
ax.set_title("Dow Jones Industrial Average", size=40)
Moving window
Financial applications
fig.savefig("out/lehman.pdf")
Optimization

© 2022 PyEcon.org
Annotations 264
Essential concepts
Getting started
Procedural

Dow Jones Industrial Average


programming
Object-orientation
27500
Numerical
programming 25000

Lehman Bankruptcy
NumPy package
Array basics 22500

Linear algebra
20000

Data formats and


17500
handling
Pandas package 15000
Series
12500
DataFrame
Import/Export data 10000

Visual illustrations 7500


Matplotlib package
6 8 0 2 4 6 8
Figures and subplots 200 200 201 201 201 201 201
Date
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Drawing on a subplot 265
Essential concepts

plt.Rectangle((x, y), width, height, angle): Creates a rect-


Getting started
Procedural
programming
Object-orientation angle
Numerical plt.Circle((x,y), radius): Creates a circle.
programming
NumPy package
Array basics Drawing
Linear algebra
fig = plt.figure(figsize=(6, 6))
Data formats and
handling ax = fig.add_subplot(1, 1, 1)
Pandas package ax.set_xticks([0, 1, 2, 3, 4, 5])
Series
ax.set_yticks([0, 1, 2, 3, 4, 5])
DataFrame
Import/Export data
rectangle = plt.Rectangle((1.5, 1),
Visual illustrations
width=0.8, height=2,
Matplotlib package color="red", angle=30)
Figures and subplots circ = plt.Circle((3, 3),
Plot types and styles
radius=1, color="blue")
Pandas layers
ax.add_patch(rectangle)
ax.add_patch(circ)
Applications
Time series
Moving window fig.savefig("out/draw.pdf")
Financial applications
Optimization
A list of all available patches can be found here: matplotlib-patches

© 2022 PyEcon.org
Drawing on a subplot 266
Essential concepts
Getting started
Procedural
programming
Object-orientation
5
Numerical
programming
NumPy package
Array basics
Linear algebra
4
Data formats and
handling
Pandas package
Series
3
DataFrame
Import/Export data

Visual illustrations
2
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
1
Applications
Time series
Moving window

0
Financial applications
Optimization 0 1 2 3 4 5

© 2022 PyEcon.org
Best practice: Visual illustrations 267
Essential concepts
Getting started
Procedural Step 1
programming
Object-orientation Create a Figure object and subplots
Numerical
programming Best practice Step 1
NumPy package
Array basics fig, ax = plt.subplots(1, 1, figsize=(16, 8))
Linear algebra

Data formats and


handling Step 2
Pandas package
Series
Plot data using different plot types
DataFrame An overview of plot types can be found in the examples gallery.
Import/Export data

Visual illustrations
Matplotlib package
Best practice Step 2
Figures and subplots
x = np.arange(0, 10, 0.1)
y = np.sin(x)
Plot types and styles
Pandas layers

Applications
ax.scatter(x, y)
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Best practice: Visual illustrations 268
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical 1.00
programming
NumPy package 0.75
Array basics
Linear algebra 0.50

Data formats and 0.25


handling
Pandas package 0.00
Series
0.25
DataFrame
Import/Export data
0.50
Visual illustrations
0.75
Matplotlib package
Figures and subplots
1.00
Plot types and styles
0 2 4 6 8 10
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Best practice: Visual illustrations 269
Essential concepts
Getting started
Procedural Step 3
programming
Object-orientation Set colors, markers and line styles
Numerical
programming Best practice Step 3
NumPy package
Array basics ax.scatter(x, y, color="green", marker="s")
Linear algebra

Data formats and


handling Step 4
Pandas package
Series
Set title, axis labels and ticks
DataFrame
Import/Export data Best practice Step 4
Visual illustrations
Matplotlib package
ax.set_title("Sine wave", fontsize=30)
Figures and subplots ax.set_xticks([0, 2.5, 5, 7.5, 10])
Plot types and styles ax.set_yticks([-1, 0, 1])
Pandas layers
ax.set_ylabel("y-value", fontsize=20)
Applications
ax.set_xlabel("x-value", fontsize=20)
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Best practice: Visual illustrations 270
Essential concepts
Getting started
Procedural
programming
Object-orientation
Sine wave
Numerical 1
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
y-value

Pandas package 0
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
1
Plot types and styles
0.0 2.5 5.0 7.5 10.0
Pandas layers x-value
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Best practice: Visual illustrations 271
Essential concepts
Getting started
Procedural Step 5
programming
Object-orientation Set labels
Numerical
programming Best practice Step 5
NumPy package
Array basics ax.scatter(x, y, color="green", marker="s", label="Sine")
Linear algebra

Data formats and


handling Step 6
Pandas package
Set legend (if you add another plot to an existing figure)
Series
DataFrame
Import/Export data Best practice Step 6
Visual illustrations
Matplotlib package
ax.plot(np.arange(11) / 10, color="blue", linestyle="-",
Figures and subplots label="Linear")
Plot types and styles ax.legend(fontsize=20)
Pandas layers

Applications
Time series
Step 7
Moving window
Financial applications
Save plot to file
Optimization
Best practice Step 7
fig.savefig("out/sinewave.pdf")

© 2022 PyEcon.org
Best practice: Visual illustrations 272
Essential concepts
Getting started
Procedural
programming
Object-orientation
Sine wave
Numerical 1
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
y-value

Pandas package 0
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package Sine
Figures and subplots
1 Linear
Plot types and styles
0.0 2.5 5.0 7.5 10.0
Pandas layers x-value
Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Section 4.4 273
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Visual illustrations
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Pandas layers
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Plotting with layers 274
Essential concepts

Plotting with matplotlib is often tedious and requires some research:


Getting started
Procedural
programming
Object-orientation You need to recall parameter details to create a professional charts. For
Numerical recurring, everyday tasks, you might prefer another level of abstraction:
programming
NumPy package Layer frameworks, which operate on top of matplotlib, produce pretty
Array basics
Linear algebra
looking results with short methods and less code. The most popular
Data formats and
packages are:
handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles pandas provides a convenient layer with frequently demanded
Pandas layers
plotting methods for its objects, such as Series and DataFrames.
Applications
Time series Seaborn is a powerful graphics framework that allows you to easily
Moving window
Financial applications create beautiful, complex graphics using a simple interface.
Optimization

→ In this section, we will have a look at pandas’ integrated layer


methods. However, Seaborn also works very well with pandas objects.
© 2022 PyEcon.org
Line plots 275
Essential concepts

DataFrame/Series.plot(): Plots a DataFrame or a Series.


Getting started
Procedural
programming
Object-orientation

Numerical
Simple line plot
programming
NumPy package
plt.close("all")
Array basics p = pd.Series(np.random.rand(10).cumsum(),
Linear algebra index=np.arange(0, 1000, 100))
Data formats and p
handling
Pandas package
Series
## 0 0.669761
DataFrame ## 100 0.989702
Import/Export data ## 200 1.655715
Visual illustrations ## 300 1.966073
Matplotlib package ## 400 2.151883
Figures and subplots
Plot types and styles
## 500 2.776987
Pandas layers ## 600 2.839751
Applications ## 700 3.188431
Time series ## 800 4.169061
Moving window ## 900 4.923286
Financial applications
Optimization
## dtype: float64

p.plot()
plt.savefig("out/line.pdf")
© 2022 PyEcon.org
Line plots 276
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
programming 5
NumPy package
Array basics
Linear algebra
4
Data formats and
handling
Pandas package
Series 3
DataFrame
Import/Export data

Visual illustrations
2
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers
1
Applications
Time series
Moving window
0 200 400 600 800
Financial applications
Optimization

© 2022 PyEcon.org
Line plots 277
Essential concepts
Getting started
Procedural
programming Line plots
Object-orientation

Numerical
df = pd.DataFrame(np.random.randn(10, 3), index=np.arange(10),
programming columns=["a", "b", "c"])
NumPy package df
Array basics

## a b c
Linear algebra

Data formats and


handling
## 0 1.703615 -1.376905 -1.336154
Pandas package ## 1 -1.402924 0.812501 1.739143
Series ## 2 0.593504 0.699582 0.423217
DataFrame
Import/Export data
## 3 1.140647 -1.454363 0.250578
## 4 -0.044809 0.438279 -0.821514
Visual illustrations
Matplotlib package
## 5 1.897959 -0.254581 0.157704
Figures and subplots ## 6 0.782639 1.196116 0.763081
Plot types and styles ## 7 0.577947 1.815039 1.175842
Pandas layers
## 8 -0.278585 -0.538956 0.102930
Applications ## 9 -0.091891 0.310788 -0.857167
Time series

df.plot(figsize=(15, 12))
Moving window
Financial applications
Optimization plt.savefig("out/line2.pdf")

© 2022 PyEcon.org
Line plots 278
Essential concepts
Getting started
Procedural
programming
Object-orientation
2.0 a
b
Numerical c
programming
NumPy package
1.5
Array basics
Linear algebra

Data formats and 1.0


handling
Pandas package
Series
DataFrame
0.5

Import/Export data

Visual illustrations
0.0
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers 0.5

Applications
Time series
1.0
Moving window
Financial applications
Optimization
1.5

0 2 4 6 8

© 2022 PyEcon.org
Plotting and pandas 279
Essential concepts
Getting started
Procedural The plot method applied to a DataFrame plots each column as a
programming
Object-orientation different line and shows the legend automatically. Plotting DataFrames,
Numerical there are serveral arguments to change the style of the plot:
programming
NumPy package
Array basics
Linear algebra
Argument Description
Data formats and kind "line", "bar", etc
handling
Pandas package
logy logarithmic scale on Y-axis
Series use_index If True, use index for tick labels
DataFrame
Import/Export data rot Rotation of tick labels
Visual illustrations xticks Values for x ticks
Matplotlib package
Figures and subplots yticks Values for y ticks
Plot types and styles
Pandas layers
grid Set grid True or False
Applications xlim X-axis limits
Time series
Moving window
ylim Y-axis limits
Financial applications subplots Plot each DataFrame column in a new subplot
Optimization

Table: Pandas plot arguments

© 2022 PyEcon.org
Pandas plot 280
Essential concepts
Getting started
Procedural
programming
Separated line plots
df.plot(grid=True, rot=45, subplots=True, title="Example",
Object-orientation

Numerical
programming figsize=(15, 10))
NumPy package plt.savefig("out/pandas.pdf")
Array basics
Linear algebra

Data formats and Example


handling
Pandas package 2.0
a
1.5
Series 1.0
DataFrame 0.5
0.0
Import/Export data 0.5
1.0
Visual illustrations 1.5
Matplotlib package
b
Figures and subplots 1.5
1.0
Plot types and styles 0.5
Pandas layers 0.0
0.5
1.0
Applications 1.5
Time series
Moving window 1.5 c
1.0
Financial applications
0.5
Optimization 0.0
0.5
1.0
0

© 2022 PyEcon.org 8
Standard creation of plots and pandas 281
Essential concepts

dataframe.plot(ax=subplot): Plots a dataframe into subplot.


Getting started
Procedural
programming
Object-orientation

Numerical
Standard creation
programming fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(1, 1, 1)
NumPy package
Array basics
Linear algebra guests = np.array([[1334, 456], [1243, 597], [1477, 505],
Data formats and [1502, 404], [854, 512], [682, 0]])
handling canteen = pd.DataFrame(guests,
Pandas package
Series
index=["Mon", "Tue", "Wed",
DataFrame "Thu", "Fri", "Sat"],
Import/Export data columns=["Zentral", "Turm"])
Visual illustrations canteen
Matplotlib package
Figures and subplots ## Zentral Turm
Plot types and styles
Pandas layers
## Mon 1334 456
## Tue 1243 597
Applications
Time series
## Wed 1477 505
Moving window ## Thu 1502 404
Financial applications ## Fri 854 512
Optimization
## Sat 682 0

© 2022 PyEcon.org
Standard creation of plots and pandas 282
Essential concepts
Getting started
Procedural
programming Bar plot
Object-orientation

Numerical
canteen.plot(ax=ax, kind="bar")
programming ax.set_ylabel("guests", fontsize=20)
NumPy package ax.set_title("Canteen use in Göttingen", fontsize=20)
Array basics
Linear algebra
fig.savefig("out/canteen.pdf")
Data formats and
handling
Pandas package
The bar plot resides in the subplot ax,
Series
DataFrame
Import/Export data
The label and title are set as shown before without using pandas.
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Bar plot 283
Essential concepts
Getting started
Procedural
programming

Canteen use in Göttingen


Object-orientation

Numerical
programming Zentral
Turm
NumPy package 1400
Array basics
Linear algebra
1200
Data formats and
handling
1000
Pandas package
guests
Series
DataFrame 800
Import/Export data

Visual illustrations 600


Matplotlib package
Figures and subplots 400
Plot types and styles
Pandas layers
200
Applications
Time series 0
Wed

Fri
Mon

Tue

Thu

Sat
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Bar plot 284
Essential concepts
Getting started
Procedural
programming Bar plot - stacked
Object-orientation

Numerical
canteen.plot(ax=ax, kind="bar", stacked=True)
programming ax.set_ylabel("guests", fontsize=20)
NumPy package ax.set_title("Canteen use in Göttingen", fontsize=20)
Array basics
Linear algebra
fig.savefig("out/canteenstacked.pdf")
Data formats and
handling
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Bar plot 285
Essential concepts
Getting started
Procedural
programming

Canteen use in Göttingen


Object-orientation

Numerical
programming 2000 Zentral
NumPy package
Turm
Zentral
Array basics 1750 Turm
Linear algebra

Data formats and 1500


handling
Pandas package 1250
guests
Series
DataFrame
1000
Import/Export data

Visual illustrations 750


Matplotlib package
Figures and subplots 500
Plot types and styles
Pandas layers
250
Applications
Time series 0
Wed

Fri
Mon

Tue

Thu

Sat
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Plot financial data 286
Essential concepts
Getting started
Procedural
programming BTC chart
fig = plt.figure(figsize=(16, 8))
Object-orientation

Numerical
programming ax = fig.add_subplot(1, 1, 1)
NumPy package ax.set_ylabel("price", fontsize=20)
Array basics ax.set_xlabel("Date", fontsize=20)
Linear algebra
BTC = pd.read_csv("data/btc-eur.csv", index_col=0, parse_dates=True)
Data formats and
handling
BTCclose = BTC["Close"]
Pandas package BTCclose.plot(ax=ax)
Series ax.set_title("BTC-EUR", fontsize=20)
DataFrame
fig.savefig("out/btc.pdf")
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Plot financial data 287
Essential concepts
Getting started
Procedural
programming
Object-orientation
BTC-EUR
Numerical
programming 15000
NumPy package
Array basics 12500
Linear algebra
10000
Data formats and
price

handling
7500
Pandas package
Series
5000
DataFrame
Import/Export data
2500

Visual illustrations
0
Matplotlib package
2 3 4 5 6 7 8 9
Figures and subplots 201 201 201 201 201 201 201 201
Plot types and styles Date
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Plot financial data 288
Essential concepts
Getting started
Procedural
programming Compare - bad illustration
Object-orientation

Numerical
amazon = pd.read_csv("data/amzn.csv", index_col=0,
programming parse_dates=True)["Close"]
NumPy package siemens = pd.read_csv("data/sie.de.csv", index_col=0,
Array basics
Linear algebra
parse_dates=True)["Close"]
fig = plt.figure(figsize=(16, 8))
Data formats and
handling ax = fig.add_subplot(1, 1, 1)
Pandas package ax.set_ylabel("price")
Series
amazon.plot(ax=ax, label="Amazon")
DataFrame
Import/Export data
siemens.plot(ax=ax, label="Siemens")
Visual illustrations
ax.legend(loc="best")
Matplotlib package fig.savefig("out/compare.pdf")
Figures and subplots
Plot types and styles
Pandas layers

Applications
In this illustration you can hardly compare the trend of the two
Time series stocks,
Moving window
Financial applications Using pandas you can standardize both dataframes in one line.
Optimization

© 2022 PyEcon.org
Plot financial data 289
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical Amazon
Siemens
programming 1400
NumPy package
Array basics 1200

Linear algebra
1000
Data formats and
price

handling 800
Pandas package
Series 600

DataFrame
400
Import/Export data

Visual illustrations 200

Matplotlib package
3 5 7 9 1 1 3
7-0 7-0 7-0 7-0 7-1 8-0 8-0
201 201 201 201 201 201 201
Figures and subplots
Plot types and styles Date

Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Plot financial data 290
Essential concepts
Getting started
Procedural
programming Compare - good illustration
Object-orientation

Numerical
amazon = amazon / amazon[0] * 100
programming siemens = siemens / siemens[0] * 100
NumPy package fig = plt.figure(figsize=(16, 8))
Array basics
Linear algebra
ax = fig.add_subplot(1, 1, 1)
ax.set_ylabel("percentage")
Data formats and
handling amazon.plot(ax=ax, label="Amazon")
Pandas package siemens.plot(ax=ax, label="Siemens")
Series
ax.legend(loc="best")
DataFrame
Import/Export data
fig.savefig("out/comparenew.pdf")
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Plot financial data 291
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical Amazon
Siemens
programming
NumPy package 160
Array basics
Linear algebra
140
Data formats and
percentage

handling
Pandas package
120
Series
DataFrame
Import/Export data
100

Visual illustrations
Matplotlib package
3 5 7 9 1 1 3
7-0 7-0 7-0 7-0 7-1 8-0 8-0
201 201 201 201 201 201 201
Figures and subplots
Plot types and styles Date

Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Chapter 5 292
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Applications
programming
NumPy package
Array basics
Linear algebra
5.1 Time series
Data formats and
handling
5.2 Moving window
Pandas package
Series 5.3 Financial applications
DataFrame
Import/Export data 5.4 Optimization
Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Section 5.1 293
Essential concepts
Getting started
Procedural
programming
Object-orientation

Numerical
Applications
programming
NumPy package
Array basics
Linear algebra

Data formats and


handling
I Time series
Pandas package
Series
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Date and time data types 294
Essential concepts
Getting started
Procedural Data types for date and time are included in the Python standard
programming
Object-orientation library.
Numerical
programming Datetime creation
NumPy package
Array basics from datetime import datetime
Linear algebra now = datetime.now()
Data formats and now
handling
Pandas package
Series
## datetime.datetime(2022, 2, 14, 0, 36, 9, 153276)
DataFrame
Import/Export data now.day
Visual illustrations
Matplotlib package ## 14
Figures and subplots
Plot types and styles now.hour
Pandas layers

Applications ## 0
Time series

From datetime you can get the attributes year, month, day, hour,
Moving window
Financial applications
Optimization
minute, second, microsecond.

© 2022 PyEcon.org
Set datetime 295
Essential concepts

datetime(year, month, day, ..., microsecond): Sets date and


Getting started
Procedural
programming
Object-orientation time.
Numerical
programming Datetime representation
NumPy package
Array basics holiday = datetime(2020, 12, 24, 8, 30)
Linear algebra holiday
Data formats and
handling ## datetime.datetime(2020, 12, 24, 8, 30)
Pandas package

exam = datetime(2020, 12, 9, 10)


Series
DataFrame
Import/Export data print("The exam will be on the " + "{:%Y-%m-%d}".format(exam))
Visual illustrations
Matplotlib package ## The exam will be on the 2020-12-09
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Time difference 296
Essential concepts

timedelta(days, seconds, microseconds): Represents difference


Getting started
Procedural
programming
Object-orientation between two datetime objects.
Numerical
programming Datetime difference
NumPy package
Array basics from datetime import timedelta
Linear algebra delta = exam - now
Data formats and delta
handling
Pandas package
Series
## datetime.timedelta(days=-432, seconds=33830, microseconds=846724)
DataFrame
Import/Export data print("The exam will take place in " + str(delta.days) + " days.")
Visual illustrations
Matplotlib package ## The exam will take place in -432 days.
Figures and subplots
Plot types and styles now
Pandas layers

Applications ## datetime.datetime(2022, 2, 14, 0, 36, 9, 153276)


Time series
Moving window
now + timedelta(10, 120)
Financial applications
Optimization
## datetime.datetime(2022, 2, 24, 0, 38, 9, 153276)

© 2022 PyEcon.org
Convert string and datetime 297
Essential concepts

datetime.strftime("format"): Converts datetime object into string.


Getting started
Procedural

datetime.strptime(datestring, "format"): Converts date as a


programming
Object-orientation

Numerical string into a datetime object.


programming
NumPy package
Array basics Convert Datetime
Linear algebra
stamp = datetime(2020, 4, 12)
Data formats and
handling
stamp
Pandas package
Series ## datetime.datetime(2020, 4, 12, 0, 0)
DataFrame
Import/Export data
print("German date format: " + stamp.strftime("%d.%m.%Y"))
Visual illustrations
Matplotlib package ## German date format: 12.04.2020
Figures and subplots

val = "2020-5-5"
Plot types and styles
Pandas layers
d = datetime.strptime(val, "%Y-%m-%d")
Applications
Time series
d
Moving window
Financial applications ## datetime.datetime(2020, 5, 5, 0, 0)
Optimization

© 2022 PyEcon.org
Convert string and datetime 298
Essential concepts
Getting started
Procedural
programming Converting examples
Object-orientation
val = "31.01.2012"
Numerical
programming d = datetime.strptime(val, "%d.%m.%Y")
NumPy package d
Array basics
Linear algebra
## datetime.datetime(2012, 1, 31, 0, 0)
Data formats and
handling
Pandas package
now.strftime("Today is %A and we are in week %W of the year %Y.")
Series
DataFrame ## 'Today is Monday and we are in week 07 of the year 2022.'
Import/Export data

Visual illustrations now.strftime("%c")


Matplotlib package
Figures and subplots
## 'Mon Feb 14 00:36:09 2022'
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Overview: Datetime formats 299
Essential concepts
Getting started
Procedural
programming
Object-orientation
Type Description
Numerical %Y 4-digit year
programming
NumPy package
%m 2-digit month [01, 12]
Array basics
Linear algebra
%d 2-digit day [01, 31]
Data formats and
%H Hour (24-hour clock) [00, 23]
handling
Pandas package
%I Hour (12-hour clock) [01, 12]
Series %M 2-digit minute [00, 59]
DataFrame
Import/Export data %S Second [00, 61]
Visual illustrations %W Week number of the year [00, 53]
Matplotlib package
Figures and subplots
%F Shortcut for %Y-%m-%d
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org
Overview : Datetime formats 300
Essential concepts
Getting started
Procedural
programming
Object-orientation
Type Description
Numerical %a Abbreviated weekday name
programming
NumPy package
%A Full weekday name
Array basics
Linear algebra
%b Abbreviated month name
Data formats and
%B Full month name
handling
Pandas package
%c Full date and time
Series %x Locale-appropriate formatted date
DataFrame
Import/Export data

Visual illustrations
Matplotlib package
Figures and subplots
Plot types and styles
Pandas layers

Applications
Time series
Moving window
Financial applications
Optimization

© 2022 PyEcon.org

You might also like