Python Basics Handbook PDF
Python Basics Handbook PDF
Python Basics Handbook PDF
1 Introduction 1
1.1 What is Python? . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Where is Python used? . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 History of Python . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Python 3 versus Python 2 . . . . . . . . . . . . . . . . . . . . 7
1.6 Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . 10
i
3.2.2 Float . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Boolean . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.4 String . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.5 Operations on String . . . . . . . . . . . . . . . . . . . 38
3.2.6 type() function . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Type Conversion . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Data Structures 65
5.1 Indexing and Slicing . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.1 Visualizing an Array . . . . . . . . . . . . . . . . . . . 67
5.2.2 Accessing Array Element . . . . . . . . . . . . . . . . 68
5.2.3 Manipulating Arrays . . . . . . . . . . . . . . . . . . . 68
5.3 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3.1 Accessing tuple elements . . . . . . . . . . . . . . . . 71
5.3.2 Immutability . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.3 Concatenating Tuples . . . . . . . . . . . . . . . . . . 72
5.3.4 Unpacking Tuples . . . . . . . . . . . . . . . . . . . . 73
5.3.5 Tuple methods . . . . . . . . . . . . . . . . . . . . . . 73
5.4 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.1 Accessing List Items . . . . . . . . . . . . . . . . . . . 75
5.4.2 Updating Lists . . . . . . . . . . . . . . . . . . . . . . 75
5.4.3 List Manipulation . . . . . . . . . . . . . . . . . . . . . 77
5.4.4 Stacks and Queues . . . . . . . . . . . . . . . . . . . . 80
ii | Table of Contents
5.5 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.5.1 Creating and accessing dictionaries . . . . . . . . . . 82
5.5.2 Altering dictionaries . . . . . . . . . . . . . . . . . . . 85
5.5.3 Dictionary Methods . . . . . . . . . . . . . . . . . . . 86
5.6 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.7 Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . 92
iv | Table of Contents
10.6.1 Indexing and Subsetting . . . . . . . . . . . . . . . . . 203
10.6.2 Boolean Indexing . . . . . . . . . . . . . . . . . . . . . 205
10.6.3 Iterating Over Arrays . . . . . . . . . . . . . . . . . . 210
10.7 Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Table of Contents | v
11.7.13 The .shift() function . . . . . . . . . . . . . . . . . . 248
11.8 Statistical Exploratory data analysis . . . . . . . . . . . . . . 250
11.8.1 The info() function . . . . . . . . . . . . . . . . . . . 250
11.8.2 The describe() function . . . . . . . . . . . . . . . . 251
11.8.3 The value_counts() function . . . . . . . . . . . . . . 252
11.8.4 The mean() function . . . . . . . . . . . . . . . . . . . 252
11.8.5 The std() function . . . . . . . . . . . . . . . . . . . . 253
11.9 Filtering Pandas DataFrame . . . . . . . . . . . . . . . . . . . 253
11.10Iterating Pandas DataFrame . . . . . . . . . . . . . . . . . . . 255
11.11Merge, Append and Concat Pandas DataFrame . . . . . . . . 256
11.12TimeSeries in Pandas . . . . . . . . . . . . . . . . . . . . . . . 259
11.12.1 Indexing Pandas TimeSeries . . . . . . . . . . . . . . . 259
11.12.2 Resampling Pandas TimeSeries . . . . . . . . . . . . . 262
11.12.3 Manipulating TimeSeries . . . . . . . . . . . . . . . . 263
11.13Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . 265
vi | Table of Contents
Preface
vii
We find that based on the domain-specific needs of our community, Python
is a near-perfect fit. It’s a high-level programming language that is rela-
tively easy to learn. When used in conjunction with some of its libraries,
it’s incredibly powerful.
• anyone who wants a brief introduction to Python and the key compo-
nents of its data science stack, and
• Python programmers who want a quick refresher on using Python for
data analysis.
viii | Preface
illustrative examples we use are associated with the financial markets. We
like to think we have done a satisfactory job of meeting the goals we set out
to achieve.
1. Read the book sequentially at your own pace from beginning to end.
Ideally, you should read the chapters before or soon after you at-
tend/watch the relevant EPAT lectures. It will certainly help in de-
veloping intuitions on new concepts that you pick up.
2. Blaze through the book linearly to get a big picture view of all the
areas covered. You can then concentrate on the different parts based
on what you find harder or what is more important for your work.
We believe there is value to be had with any of these approaches, and each
of us needs to assess what works best for us based on our learning style.
Python has been around for about three decades now. There are several
excellent books, videos, online courses, and blogs covering it from various
angles and directed at different kinds of users. However, the core set of
ideas and concepts are well-understood and covered by most of them.
Copyright License
This work is licensed under the Creative Commons Attribution-ShareAlike
4.0 International License2 .
2 http://creativecommons.org/licenses/by-sa/4.0/
Preface | ix
That is why you see this image here. In essence, it means that you can
use, share, or improve upon this work (even commercially) as long as you
provide attribution to us. To put things in perspective, Wikipedia3 also uses
the same license.
Acknowledgments
Jay Parmar, Mario Pisa Pena, and Vivek Krishnamoorthy are the authors of
this book. Jay’s done the lion’s share of the writing and formatting. Mario’s
written some sections and reviewed most of the others. Vivek was the
principal conspirator in hatching the book-writing plan to ease a student’s
learning journey as far as possible. He was also involved in the writing,
the editing, the review, and in overseeing this venture.
Our debts in the writing of this book are many, and we spell them out now.
Bear with us.
We have learned a great deal from the writings of experts in the in-
vestor/trader community and the Python community on online Q&A
forums like stackoverflow.com, quora.com,and others and are indebted to
them. A special shout-out to Dave Bergstrom (Twitter handle @Dburgh),
Matt Harrison (Twitter handle @__mharrison__) and PlanB (Twitter handle
@100trillionUSD) for their comments on our work.
3 https://en.wikipedia.org/wiki/Main_Page
x | Preface
We are also grateful to the helpful and supportive team members of
QuantInsti. Many of them worked uncomplainingly on tight timelines
and despite our badgering (or perhaps because :)), gave us insightful
suggestions.
Finally, we would like to thank all the students we have taught in the past
several years. A special thanks to those of you who endured our first few
iterations of the lectures before we learned how best to teach it. Dear stu-
dents, we exist because you exist. You have inspired us, challenged us, and
pushed us never to stop learning just to keep up with you. We hope you
enjoy reading this as much as we enjoyed writing it for you.
Preface | xi
xii | Preface
Chapter 1
Introduction
1
Python is an interpreted, object-oriented, high-level programming lan-
guage with dynamic semantics. Its high-level built in data struc-
tures, combined with dynamic typing and dynamic binding, make it
very attractive for Rapid Application Development, as well as for use
as a scripting or glue language to connect existing components to-
gether. Python’s simple, easy to learn syntax emphasizes readability
and therefore reduces the cost of program maintenance. Python sup-
ports modules and packages, which encourages program modularity
and code reuse. The Python interpreter and the extensive standard
library are available in source or binary form without charge for all
major platforms and can be freely distributed.
• Web and Internet development: Python is used on the server side to cre-
ate web applications.
• Software development: Python is used to create GUI applications, con-
necting databases, etc.
• Scientific and Numeric applications: Python is used to handle big data
and perform complex mathematics.
• Education: Python is a great language for teaching programming, both
at the introductory level and in more advanced courses.
• Desktop GUIs: The Tk GUI library2 included with most binary distri-
butions of Python is used extensively to build desktop applications.
• Business Applications: Python is also used to build ERP and e-
commerce systems.
2 | Chapter 1
• Simple
– Compared to many other programming languages, coding in
Python is like writing simple strict English sentences. In fact,
one of its oft-touted strengths is how Python code appears like
pseudo-code. It allows us to concentrate on the solution to the
problem rather than the language itself.
• Easy to Learn
– As we will see, Python has a gentler learning curve (compared
to languages like C, Java, etc.) due to its simple syntax.
• Free and Open Source
– Python and the majority of supporting libraries available are
open source and generally come with flexible and open licenses.
It is an example of a FLOSS(Free/Libré and Open Source Soft-
ware). In layman terms, we can freely distribute copies of open
source software, access its source code, make changes to it, and
use it in new free programs.
• High-level
– Python is a programming language with strong abstraction from
the details of the underlying platform or the machine. In con-
trast to low-level programming languages, it uses natural lan-
guage elements, is easier to use, automates significant areas of
computing systems such as resource allocation. This simplifies
the development process when compared to a lower-level lan-
guage. When we write programs in Python, we never need to
bother about the lower-level details such as managing the mem-
ory used by programs we write, etc.
• Dynamically Typed
– Types of variables, objects, etc. in Python are generally inferred
during runtime and not statically assigned/declared as in most
of the other compiled languages such as C or Fortran.
• Portable/Platform Independent/Cross Platform
– Being open source and also with support across multiple
platforms, Python can be ported to Windows, Linux and Mac
Why Python? | 3
OS. All Python programs can work on any of these platforms
without requiring any changes at all if we are careful in avoiding
any platform-specific dependency. It is used in the running of
powerful severs and also small devices like the Raspberry Pi3 .
4 | Chapter 1
– Python supports various programming and implementation
paradigms, such as Object Oriented, Functional, or Procedural pro-
gramming.
• Extensible
– If we need some piece of code to run fast, we can write that part
of the code in C or C++ and then use it via our Python program.
Conversely, we can embed Python code in a C/C++ program to
give it scripting capabilities.
• Extensive Libraries
– The Python Standard Library4 is huge and, it offers a wide range
of facilities. It contains built-in modules written in C that pro-
vides access to system functionality such as I/O operations as
well as modules written in Python that provide standardized so-
lutions for many problems that occur in everyday programming.
Some of these modules are listed below
* Text Processing Modules
* Data Types
* Numeric and Mathematical Modules
* Files and Directory Modules
* Cryptographic Modules
* Generic Operating System Modules
* Networking Modules
* Internet Protocols and Support Modules
* Multimedia Services
* Graphical User Interfaces with Tk
* Debugging and Profiling
* Software Development, Packaging and Distribution
– In addition to the Python Standard Library, we have various
other third-party libraries which can be accessed from Python
Package Index5 .
• Garbage Collection
– Python takes care of memory allocation and deallocation on its
own. In other words, a programmer does not have to manage
4 https://docs.python.org/3/library/
5 https://pypi.org/
Why Python? | 5
memory allocation and need not have to preallocate and deal-
locate memory before constructing variables and objects. Addi-
tionally, Python provides Garbage Collector6 interface to handle
garbage collection.
Python is the brainchild of Guido van Rossum who started its developmen-
tal efforts in the 1980s. Its name has nothing to do with anything serpen-
tine, it’s in fact inspired by the British comedy Monty Python! The first
Python implementation was in December 1989 in the Netherlands. Since
then, Python has gone through major turnarounds periodically. The fol-
lowing can be considered milestones in the development of Python:
Often times it is quite confusing for newcomers that there are two major
versions 2.x and 3.x available, still being developed and in parallel use
since 2008. This will likely persist for a while since both versions are quite
popular and used extensively in the scientific and software development
community. One point to note is that they are not entirely code compatible
between the versions. We can develop programs and write code in either
version but there will be syntactical and other differences. This handbook is
based on the 3.x version, but we believe most of the code examples should
work with version 2.x as well with some minor tweaks.
6 https://docs.python.org/3/library/gc.html
6 | Chapter 1
1.5 Python 3 versus Python 2
The first version of the Python 3.x was released at the end of 2008. It made
changes that made some of the old Python 2.x code incompatible. In this
section, we will discuss the difference between the two versions. However,
before moving further one might wonder why Python 3 and not Python 2.
The most compelling reason for porting to Python 3 is, Python 2.x will not
be developed after 2020. So it’s no longer a good idea to start new projects
in Python 2.x. There won’t ever be a Python 2.8. Also, Python 2.7 will only
get security updates from the Python 3 development branch. That being
said, most of the code we write will work on either version with some
small caveats.
We now discuss some of the significant changes between the two versions.
5 / 2
– The answer we expect here is 2.5, but instead Python 2 will re-
turn only 2. Following the core value mentioned above, Python
will return the output of the same type as the input type. Here,
the input is integer and Python returned the output as the inte-
ger.
– Again, this has been fixed in Python 3. It will now output 2.5 as
the output to the above problem. In fact, it gives a float output
to every division operation.
• Print Function
• Input Function
8 | Chapter 1
if an integer is inputted such as 123, it would be treated as an in-
teger without being converted to a string. If a string is inputted
for input(), Python 2 will throw an error.
– In Python 3, raw_input() is gone and input() no longer evalu-
ates the data it receives. We always get back a string whatever
the input may be.
• Error Handling
10 | Chapter 1
Chapter 2
# Addition
In []: 5 + 3
Out[]: 8
# Subtraction
In []: 5 - 3
Out[]: 2
11
# Multiplication
In []: 5 * 3
Out[]: 15
# Division
In []: 5 / 3
Out[]: 1.6666666666666667
# Modulo
In []: 5 % 2
Out[]: 1
NOTE : The content after the # symbols are comments and can be
ignored when typing the examples. We will examine comments
in more detail in the later sections. Here, In refers to an input
provided to the Python interpreter and Out represents the out-
put returned by the interpreter. Here we use the IPython con-
sole to perform the above mathematical operations. They can
also be performed in the Python IDLE (Integrated Development
and Learning Environment) (aka The Shell), the Python console,
or Jupyter notebook in a similar fashion. Basically, we have a
host of interfaces to choose from and programmers choose what
they find most comfortable. We will stick to the Python Console
interface to write and run our Python code in this handbook.
To be clear, each of the above-mentioned interfaces connects us
to the Python interpreter (which does the computational heavy
lifting behind the scenes).
12 | Chapter 2
Similar to the / division operator, we also have the // integer division oper-
ator. The key difference is that the former outputs the decimal value known
as a float which can be seen in the above example and the latter outputs an
integer value i.e. without any fractional parts. We will discuss about the
float and integer datatype in more detail in the upcoming sections. Be-
low is an example of an integer division where Python returns the output
value without any decimals.
In []: 5 // 3
Out[]: 1
# Composite expression
In []: 5 + 3 - 3 + 4
Out[]: 9
In the example above, the order of evaluation is from left to right, resulting
in the expression 5 + 3 evaluating first. Its value 8 is then combined with
the next operand 3 by the - operator, evaluating to the value 5 of the
composite expression 5 + 3 - 3. This value is in turn combined with the
last literal 4 by the + operator, ending up with the value 9 for the whole
expression.
In the example, operators are applied from left to right, because - and +
have the same priority. For an expression where we have more than one
operators, it is not necessary all the operators have the same priority. Con-
sider the following example,
In []: 5 + 3 * 3 - 4
Out[]: 10
Here, the expression above evaluated to 10, because the * operator has a
higher priority compared to - and + operators. The expression 3 * 3 is
evaluated first resulting in the value of 9 which will be combined with the
operand 5 by the operator + producing the value of 14. This value is in
turn combined with the next operand 4 by the operator - which results in
the final value of 10. The order in which operators are applied is called
Python as a Calculator | 13
operator precedence. In Python, mathematical operators follow the natural
precedence observed in mathematics.
# Brackets
In []: (5 + 3) * (3 - 4)
Out[]: -8
In the examples above, an operator connects two operands, and hence they
are called binary operators. In contrast, operators can also be unary which
take only one operand. Such an operator is - known as negation.
# Negation
In []: - (5 + 3)
Out[]: -8
14 | Chapter 2
# Floating Point Multiplication
In []: 5.0 * 3
Out[]: 15.0
For the above example, the last part calculates the positive square root of 36.
In []: type(5)
Out[]: int
In []: type(5.0)
Out[]: float
In []: type(5.0 ** 3)
Out[]: float
Python as a Calculator | 15
We can also convert the type of an argument using the following built-in
functions. For example,
In []: float(5)
Out[]: 5.0
In []: type(float(5))
Out[]: float
In []: int(5.9)
Out[]: 5
In []: type(int(5.9))
Out[]: int
As can be seen in the above example, using the float function call, which
takes a single argument, we can convert an integer input to a float value.
Also, we cross verify it by using the type function. Likewise, we have an
int function using which we can change a float input to the integer value.
During the conversion process int function just ignores the fractional part
of the input value. In the last example, the return value of int(5.9) is 5,
even though 5.9 is numerically closer to the integer 6. For floating point
conversion by rounding up an integer, we can use the round function.
In []: round(5.9)
Out[]: 6
In []: round(5.2)
Out[]: 5
A call to the round function will return a value numerically closer to the
argument. It can also round up to a specific number of digits after the dec-
imal point in the argument. We then need to specify two arguments to the
function call, with the second argument specifying the number of digits to
keep after the decimal point. The following examples illustrate the same. A
comma is used to separate the arguments in the function call.
In []: round(5.98765, 2)
Out[]: 5.99
16 | Chapter 2
In []: round(5.98765, 1)
Out[]: 6.0
Another useful function is abs, which takes one numerical argument and
returns its absolute value.
In []: abs(-5)
Out[]: 5
In []: abs(5)
Out[]: 5
In []: abs(5.0)
Out[]: 5.0
In []: 5e1
Out[]: 50.0
In []: 5e-1
Out[]: 0.5
In []: 5E2
Out[]: 500.0
Python Basics | 17
It is called a literal because we use its value literally. The number 5 always
represents itself and nothing else -- it is a constant because its value cannot
be changed. Similarly, value 2.85 represents itself. Hence, all these are said
to be a literal constant.
2.2.2 Numbers
We have already covered numbers in detail in the above section. Here we
will discuss it in brief. Numbers can be broadly classified into two types -
integer and float.
Examples of a floating point number (floats for short) are - 2.98745, 5.5,
5e-1, etc. Here, e refers to the power of 10. We can write either e or E, both
work just fine.
2.2.3 Strings
Simply put, a string is a sequence of characters. We use strings almost
everywhere in Python code. Python supports both ASCII and Unicode
strings. Let us explore strings in more detail.
Single Quote - We can specify a string using single quotes such as 'Python
is an easy programming language!'. All spaces and tabs within the
quotes are preserved as-is.
Double Quotes - We can also specify string using double quotes such as
"Yes! Indeed, Python is easy.". Double quotes work the same way
single quotes works. Either can be used.
Triple Quotes - This is used as a delimiter to mark the start and end of a
comment. We explain it in greater detail in the next topic.
18 | Chapter 2
Strings are immutable - This means once we have created a string we cannot
change it. Consider the following example.
2.2.4 Comments
We have already seen comments before. Comments are used to annotate
codes, and they are not interpreted by Python. Comments in Python start
with the hash character # and end at the end of the physical line in the
code. A comment may appear at the start of a line or following whitespace
or code, but not within a string literal. A hash character within a string
literal is just a hash character. This type of comment is also known as a
single-line comment.
The other way we can annotate code is by using a multi-line comment that
serves as a reference or documentation for others to understand the code.
Python Basics | 19
# Following line adds two integer numbers
In []: 5 + 3
Out[]: 8
We can write a comment after code line to annotate what particular line
does as depicted in the following example.
20 | Chapter 2
Let us visit a few examples to understand how the print() works.
# f-strings
In []: print(f'The stock ticker for Apple Inc
is {stock_name}.')
Out[]: The stock ticker for Apple Inc is AAPL.
The above string is called formatted string literal. Such strings are preceded
by the letter f indicating that it be formatted when we use variable names
between curly brackets {}. stock_name here is a variable name containing
the symbol for a stock.
# %-formatting strings
In []: print("%s is currently trading at %.2f."
Python Basics | 21
%(stock_name, price))
Out[]: AAPL is currently trading at 226.41.
Here we print the current trading price of AAPL stock. A stock name is stored
in the variable stock_name, and its price is stored in the variable price. %s
is used for specifying a string literal and %f is used to specify float literal.
We use %.2f to limit two digits after the decimal point.
Upon running the above code, we will be presented with the following out-
put.
# Output
Out[]: We are interested in AAPL which is currently
trading at 226.41
Above code will first prepare a string internally by substituting the x and y
placeholders with variables stock_ticker and price respectively, and then
prints the final output as a single string. Instead of using placeholders, we
can also construct a string in the following manner:
In []: print('We are interested in {0} which is currently
trading at {1}'.format(stock_ticker, price))
Here, the output will be similar to the above illustration. A string can be
constructed using certain specifications, and the format function can be
called to substitute those specifications with corresponding arguments of
the format function. In the above example, {0} will be substituted by vari-
able stock_ticker and similarly, {1} will get a value of price. Numbers
provided inside the specification are optional, and hence we can also write
the same statement as follows
22 | Chapter 2
print('We are interested in {} which is currently trading
at {}'.format(stock_ticker, price))
In []: print('AAPL.\tNIFTY50.\tDJIA.\tNIKKEI225.')
Out[]: AAPL. NIFTY50. DJIA. NIKKEI225.
Python Basics | 23
In a string, if we are to mention a single \ at the end of the line, it indi-
cates that the string is continued in the next line and no new line is added.
Consider below example:
Likewise, there are many more escape sequences which can be found on
the official Python documentation1 .
2.2.8 Indentation
Whitespaces are important in Python. Whitespace at the start of a line is
called indentation. It is used to mark the start of a new code block. A block or
code block is a group of statements in a program or a script. Leading spaces
at the beginning of a line are used to determine the indentation level, which
in turn is used to determine the grouping of statements. Also, statements
which go together must have same indentation level.
A wrong indentation raises the error. For example,
stock_name = 'AAPL'
# Correct indentation.
print('Stock name is', stock_name)
Upon running the following code, we will be presented with the following
error
24 | Chapter 2
The error indicates to us that the syntax of the program is invalid. That is,
the program is not properly written. We cannot indent new blocks of state-
ments arbitrarily. Indentation is used widely for defining new block while
defining functions, control flow statement, etc. which we will be discussing
in detail in the upcoming chapters.
Key Takeaways | 25
26 | Chapter 2
Chapter 3
We have previously seen that a variable can take data in various formats
such as a string, an integer, a number with fractional parts (float), etc. It
is now time to look at each of these concepts in greater detail. We start by
defining a variable.
3.1 Variables
A variable can be thought of as a container having a name which is used to
store a value. In programming parlance, it is a reserved memory location
to store values. In other words, a variable in a Python program gives
necessary data to a computer for processing.
In this section, we will learn about variables and their types. Let start by
creating a variable.
27
sign = a.k.a. Assignment operator. A variable is created the moment we
assign the first value to it.
# Creating a variable
In []: price = 226
In []: print(price)
Out[]: 226 # Output
Later, if we change the value of price and run the print statement again,
the new value will appear as output. This is known as re-declaration of the
variable.
The chained assignment shown in the above example assigns the value 200
to variables x, y, and z simultaneously.
28 | Chapter 3
stock = 'AAPL' # Valid name
_name = 'AAPL' # Valid name
Variables | 29
STOCK = 'AAPL'
stock = 'MSFT'
Stock = 'GOOG'
# Valid name.
stockname = 'AAPL'
30 | Chapter 3
# Valid name.
Stock_name = 'AAPL'
• Integer
• Float
• String
• Boolean
Though we have already had a brief overview of integer, float and string in
the previous section, we will cover these data types in greater detail in this
section.
3.2.1 Integer
An integer can be thought of as a numeric value without any decimal. In
fact, it is used to describe any whole number in Python such as 7, 256, 1024,
etc. We use an integer value to represent a numeric data from negative
infinity to infinity. Such numeric numbers are assigned to variables using
an assignment operator.
In []: total_output_of_dice_roll = 6
In []: days_elapsed = 30
Data Types | 31
In []: total_months = 12
In []: year = 2019
3.2.2 Float
A float stands for floating point number which essentially means a number
with fractional parts. It can also be used for rational numbers, usually end-
ing with a decimal such as 6.5, 100.1, 123.45, etc. Below are some exam-
ples where a float value is more appropriate rather than an integer.
By doing so, we get a fairly good idea how data types and variable names go hand
in hand. This, in turn, can be used in expressions to perform any mathematical
calculation.
Let’s revisit the topic Python as a Calculator very briefly but this time using
variables.
# Addition
In []: print(x + y)
Out[]: 12.0
32 | Chapter 3
# Subtraction
In []: print(x - y)
Out[]: -8.0
# Multiplication
In []: print(x * y)
Out[]: 20.0
# Division
In []: print(x / y)
Out[]: 0.2
# Modulo
In []: print(x % y)
Out[]: 2.0
# Exponential / Power
In []: print(x ** y)
Out[]: 1024.0
NOTE : Please note the precise use of comments used in the code
snippet to describe the functionality. Also, note that output of all
expressions to be float number as one of the literals used in the
input is a float value.
Data Types | 33
3.2.3 Boolean
This built-in data type can have one of two values, True or False. We use
an assignment operator = to assign a boolean value to variables in a manner
similar to what we have seen for integer and float values. For example:
In []: print(buy)
Out[]: True
In []: print(sell)
Out[]: False
The above examples are some of the simplest boolean expressions that eval-
uate to either True or False.
34 | Chapter 3
upper case followed by lower case letters. The following list
will not be evaluated to a boolean value - ’TRUE’ - TRUE - true
- ’FALSE’ - FALSE - false
3.2.4 String
A string is a collection of alphabets, numbers, and other characters written
within a single quote ' or double quotes ". In other words, it is a sequence
of characters within quotes. Let us understand how a string works with the
help of some examples.
# Variable assignment with a string
In []: sample_string = '1% can also be expressed as 0.01'
In []: stock_price
Out[]: '224.61'
Data Types | 35
# Re-declaring the variable with an integer value
In []: stock_price = 224.61
In []: string * 3
Out[]: 'Python! Python! Python! '
We can select a substring or part of a string using the slice operation. Slicing
is performed using the square brackets []. The syntax for slicing a single
element from the string is [index] which will return an element at index.
The index refers to the position of each element in a string and it begins with
0, which keeps on increasing in chronological order for every next element.
36 | Chapter 3
To slice a substring from a string, the syntax used is [start index:end
index] which will return the substring starting from an element at start
index up to but not including an element at end index. Consider the fol-
lowing example, where we substring the string from an index 0 up to 4
which yields the output 'EPAT'. Notice how the element ' ' at an index 4
is not included in the output. Similarly, we slice a substring as seen in the
below example.
In []: string[0:4]
Out[]: 'EPAT'
In []: string[4]
Out[]: ' '
In []: string[5:13]
Out[]: 'Handbook'
In []: string[13]
Out[]: '!'
In []: string[141]
Traceback (most recent call last):
In the above example, the last index is 13. The slicing operation performed
with an index 14 will result in an error IndexError stating that index we
are looking for is not present.
NOTE : We list out some of the important points for string literals
below: - In Python 3.x all strings are Unicode by default. - A
string can be written within either '' or "". Both work fine. -
Data Types | 37
Strings are immutable. (although you can modify the variable)
- An escape sequence is used within a string to mark a new line,
provide tab space, writing \ character, etc.
• upper() method: This method returns the upper case version of the
string.
In []: sample_string.upper()
Out[]: 'EPAT HANDBOOK!'
• lower() method: This method returns the lower case version of the
string.
In []: sample_string.lower()
Out[]: 'epat handbook!'
• isalpha() method: This method returns the boolean value True if all
characters in a string are letters, False otherwise.
In []: 'Alphabets'.isalpha()
Out[]: True
38 | Chapter 3
# The string under evaluation contains whitespace.
In []: 'This string contains only alphabets'.isalpha()
Out[]: False
• isdigit() method: This method returns the boolean value True if all
characters in a string are digits, False otherwise.
In []: '12345'.isdigit()
Out[]: True
Data Types | 39
Out[]: '00 01 10 11'.replace('0', '1')
Out[]: '11 11 11 11' # Replace 0 with 1
Here, the Python outputs three strings in a single data structure called List.
We will learn list in more detail in the upcoming section.
40 | Chapter 3
In []: 'EPAT Handbook'.count('o')
Out[]: 2
In []: type(False)
Out[]: bool
Data Types | 41
# An object passed as an argument belongs to the
# class 'list'.
In []: type([1, 2, 3])
Out[]: list
A list, dict, tuple, set are native data structures within Python. We will
learn these data structures in the upcoming section.
42 | Chapter 3
In []: 8 / 2
Out[]: 4.0
Here we attempted to join a string 'This is the year ' and an integer
2019. Doing so, Python threw an error TypeError stating incompatible data
types. One way to perform the concatenation between the two is to convert
the data type of 2019 to string explicitly and then perform the operation.
We use str() to convert an integer to string.
Similarly, we can explicitly change the data type of literals in the following
manner.
Type Conversion | 43
Out[]: 4.2
In []: float('4.0')
Out[]: 4.0
In []: int(4.2)
Out[]: 4
In the above example, we have seen how we can change the data type of
literals from one to another. Similarly, the boolean data type represented by
bool is no different. We can typecast bool to int as we do for the rest. In
fact, Python internally treats the boolean value False as 0 and True as 1.
44 | Chapter 3
# Boolean to integer conversion
In []: int(False)
Out[]: 0
In []: int(True)
Out[]: 1
In []: bool(1)
Out[]: True
In []: bool(-1)
Out[]: True
In []: bool(125)
Out[]: True
Key Takeaways | 45
5. Variable names are case sensitive and cannot start with a number.
6. There are four primitive data types in Python:
(a) Integer represented by int
(b) Float represented by float
(c) String represented by str
(d) Boolean (True or False) represented by bool
7. Internally, True is treated as 1 and False is treated as 0 in Python.
8. A substring or a part of a string is selected using the square brackets
[] also known as slice operation.
9. Type conversion happens either implicitly or explicitly.
(a) Implicit type conversion happens when an operation with com-
patible data types is executed. For example, 4/2 (integer divi-
sion) will return 2.0 (float output).
(b) When an operation involves incompatible data types, they need
to be converted to compatible or similar data type. For example:
To print a string and an integer together, the integer value needs
to be converted to a string before printing.
46 | Chapter 3
Chapter 4
To support this, Python has a way to put a code definition in a file and
use them in another script or directly in an interactive instance of the
interpreter. Such a file is called a module; definitions from a module can be
imported into other modules or in the program that we code.
47
# -*- coding: utf-8 -*-
"""
Created on Fri Sep 21 09:29:05 2018
@filename: arithmetic.py
@author: Jay Parmar
"""
def factorial(n):
"""Returns the factorial of n"""
i = 0
result = 1
while(i != n):
i = i + 1
result = result * i
return result
We are now ready to import this file in other scripts or directly into the
Python interpreter. We can do so with the following command:
Once we have imported the module, we can start using its definition in the
script without re-writing the same code in the script. We can access func-
48 | Chapter 4
tions within the imported module using its name. Consider an example
below:
In []: print(result)
Out[]: 5
In []: arithmetic.multiply(3, 5)
Out[]: 15
In []: arithmetic.division(10, 4)
Out[]: 2.5
In []: arithmetic.factorial(5)
Out[]: 120
In []: arithmetic.__name__
Out[]: 'arithmetic'
• The directory containing the input script (or the current directory).
• PYTHONPATH (An environment variable)
• The installation-dependent path.
Here, the module named arithmetic has been created that can be imported
into the other modules as well. Apart from this, Python has a large set
of built-in modules known as the Python Standard Library, which we will
discuss next.
| 49
4.1 Standard Modules
Python comes with a library of standard modules also referred to as the
Python Standard Library1 . Some modules are built into the interpreter;
these modules provide access to operations that are not part of the core
of the language but are either for efficiency or to provide access to tasks
pertaining to the operating system. The set of such modules available also
depends on the underlying platform. For example, winreg2 module is
available only on the Windows platform.
The Python installers for the Windows platform usually include the entire
standard library and often also include many additional components. For
Unix-like operating systems, Python is normally provided as a collection
of packages, so it may be necessary to use the packaging tools provided
with the operating system to obtain some or all of the optional components.
One particular module that deserves attention is sys, which is built into
every Python interpreter. This module provides access to variables used or
maintained by the interpreter and to functions that interact with the inter-
preter. It is always available and used as follows:
50 | Chapter 4
Research Initiatives.\nAll Rights Reserved.\n\n
Copyright (c) 1991-1995 Stichting Mathematisch
Centrum, Amsterdam.\nAll Rights Reserved.'
Standard Modules | 51
• Concurrent Execution: threading, multiprocessing, sched, queue,
etc.
• Networking: socket, ssl, asyncio, signal, etc.
• Internet Data Handling: email, json, mailbox, mimetypes, binascii,
etc.
• Internet Protocols: urllib, http, ftplib, smtplib, telnetlib, xmlrpc,
etc.
4.2 Packages
Packages can be considered as a collection of modules. It is a way of struc-
turing Python’s module namespace by using "dotted module names". For
example, the module name matplotlib.pyplot designates a submodule
named pyplot in a package named matplotlib. Packaging modules in
such a way saves the author of different modules from having to worry
about each other’s global variable names and the use of dotted module
names saves the author of multi-module packages from having to worry
about each other’s module names.
52 | Chapter 4
equity.py Equity module
currency.py
options.py
...
strategies/ Sub-package for strategies
__init__.py
rsi.py RSI module
macd.py
smalma.py
peratio.py
fundamentalindex.py
statisticalarbitrage.py
turtle.py
...
operations/ Sub-package for operations
__init__.py
performanceanalytics.py
dataconversion.py
...
import strats.data.equity
import strats.strategies.statisticalarbitrage
The goal here is to install software that can automatically download and
install Python modules/libraries for us. Two commonly used installation
managers are conda3 and pip4 . We choose to go with pip for our installa-
tions.
pip comes pre-installed for Python >= 2.7 or Python >= 3.4
downloaded from Python official site5 . If the Anaconda distri-
bution has been installed, both pip and conda are available to
manage package installations.
curl -O https://bootstrap.pypa.io/get-pip.py
python get-pip.py
If the above command fails on a Mac and Linux distribution due to permis-
sion issues (most likely because Python does not have permission to update
certain directories on the file system. These directories are read-only by de-
fault to ensure that random scripts cannot mess with important files and
infect the system with viruses), we may need to run following command.
54 | Chapter 4
usually the documentation or installation instructions will include the
necessary pip command.
The Python Package Index6 is the main repository for third-party Python
packages. The advantage of a library being available on PyPI is the ease of
installation using pip install <package_name> such as
Remember, again if the above command fails on a Mac and Linux distribu-
tion due to permission issue, we can run the following command:
The above examples will install the latest version of the libraries. To install
a specific version, we execute the following command:
To install greater than or equal to one version and less than another:
Listed below are some of the most popular libraries used in different do-
mains:
We can upgrade already installed libraries to the latest version from PyPI
using the following command:
6 https://pypi.org/
The above example will import all definitions within the imported library.
We can use these definitions using . (dot operator). For example,
56 | Chapter 4
# Accessing the 'DataFrame' module from the 'pandas'
# library
In []: pandas.DataFrame
Out[]: pandas.core.frame.DataFrame
As seen in the above example, we can access attributes and methods of the
imported library using the dot operator along with the library name. In fact,
the library we import acts as an object and hence, we can call its attributes
using the dot notation. We can also alias a library name while importing it
with the help of the as keyword.
In []: m.pi
Out[]: 3.141592653589793
In []: m.e
Out[]: 2.718281828459045
In []: m.gamma
Out[]: <function math.gamma>
Importing modules | 57
# Import all definitions from pyplot module of matplotlib
# library
In []: from matplotlib.pyplot import *
In []: e
Out[]: 2.718281828459045
In []: floor(10.8)
Out[]: 10
# Selective import
# Import only floor from math module
In []: from math import floor
In []: floor(10.8)
Out[]: 10
58 | Chapter 4
# Error line as the math is not imported
# Only the floor from math is imported
In []: math.ceil(10.2)
Traceback (most recent call last):
In the above example, we selectively import the floor from the math mod-
ule. If we try to access any other definition from the math module, Python
will return an error stating definition not defined as the interpreter won’t be
able to find any such definition in the code.
# Content of strategy/vwap_module.py
def run_strategy():
print('Running strategy logic')
# Content of backtest/backtesting.py
import vwap_module
vwap_module.run_strategy()
Importing modules | 59
File "backtest/backtesting.py", line 1, in <module>
import vwap_module
When Python hits the line import vwap_module, it tries to find a package
or a module called vwap_module. A module is a file with a matching
extension, such as .py. Here, Python is looking for a file vwap_module.py
in the same directory where backtesting.py exists, and not finding it.
Python has a simple algorithm for finding a module with a given name,
such as vwap_module. It looks for a file called vwap_module.py in the direc-
tories listed in the variable sys.path.
In []: type(sys.path)
Out[]: list
C:\Users\...\Continuum\anaconda3\python36.zip
C:\Users\...\Continuum\anaconda3\DLLs
C:\Users\...\Continuum\anaconda3\lib
C:\Users\...\Continuum\anaconda3
C:\Users\...\Continuum\anaconda3\lib\site-packages
C:\Users\...\Continuum\anaconda3\lib\site-packages\win32
C:\Users\...\Continuum\anaconda3\lib\site-packages\win32\lib
C:\Users\...\Continuum\anaconda3\lib\site-packages\Pythonwin
C:\Users\...\.ipython
In the above code snippet, we print paths present in the sys.path. The
vwap_strategy.py file is in the strategy directory, and this directory is not
in the sys.path list.
Because sys.path is just a Python list, we can make the import statement
work by appending the strategy directory to the list.
60 | Chapter 4
In []: import sys
In []: sys.path.append('strategy')
As a crude hack, we can keep the module in the same directory as the code
file.
4.5 dir()function
We can use the built-in function dir() to find which names a module de-
fines. It returns a sorted list of strings.
In []: dir(arithmetic)
Out[]:
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'addition',
'division',
'factorial',
'multiply']
dir()function | 61
Here, we can see a sorted list of names within the module arithmetic. All
other names that begin with an underscore are default Python attributes
associated with the module (we did not define them.)
In []: a = 1
In []: b = 'string'
In []: dir()
Out[]:
['__builtins__',
'a',
'arithmetic',
'b',
'exit',
'quit']
Note that it lists all types of names: variables, modules, functions, etc.
The dir() does not list the names of built-in functions and variables. They
are defined in the standard module builtins. We can list them by passing
builtins as an argument in the dir().
In []: dir(builtins)
Out[]: ['ArithmeticError', 'AssertionError',
'AttributeError', 'BaseException',
'BlockingIOError', 'BrokenPipeError',
'BufferError', 'BytesWarning', 'ChildProcessError',
'ConnectionAbortedError', 'ConnectionError',
'ConnectionRefusedError', 'ConnectionResetError',
'DeprecationWarning', 'EOFError', 'Ellipsis',
'EnvironmentError', 'Exception', 'False',
'SyntaxError', ... ]
62 | Chapter 4
4.6 Key Takeaways
1. A module is a Python file which can be referenced and used in other
Python code.
2. A single module can also have multiple Python files grouped to-
gether.
3. A collection of modules are known as packages or libraries. The
words library and package are used interchangeably.
4. Python comes with a large set of built-in libraries known as the
Python Standard Library.
5. Modules in Python Standard Library provides access to core system
functionality and solutions for many problems that occur in everyday
programming.
6. The sys library is present in every Python installation irrespective of
the distribution and underlying architecture and it acts as an interme-
diary between the system and Python.
7. In addition to built-in libraries, additional third-party/external li-
braries can be installed using either the pip or conda package man-
agers.
8. The pip command comes pre-installed for Python version >= 2.7 or
Python >=3.4
9. A library (either built-in or external) needs to be imported into
Python code before it can be used. It can be achieved using import
library_name keyword.
10. It is a good idea to alias the library name that we import using an as
keyword.
11. It is always a good programming practice to selectively import only
those modules which are required, instead of importing the whole
library.
12. Python will look out for the library being imported in the module
search path. If the library is not available in any of the paths listed by
module search path, Python will throw an error.
13. The dir() function is used to list all attributes and methods of an
object. If a library name is passed as an argument to the dir(), it
returns sub-modules and functions of the library.
Key Takeaways | 63
64 | Chapter 4
Chapter 5
Data Structures
In this section we will learn about various built-in data structures such as
tuples, lists, dictionaries, and sets. Like a variable, data structures are also
used to store a value. Unlike a variable, they don’t just store a value, rather
a collection of values in various formats. Broadly data structures are di-
vided into array, list and file. Arrays can be considered a basic form of data
structure while files are more advanced to store complex data.
Index 0 1 2 3 4 5 6 7 8 9
Sequence A B C D E F G H I J
65
each step. Whenever a new character is appended to this sequence, it will
be appended at the end, and will be assigned the next index value (in the
above example, the new index will be 10 for the new character). Almost all
data structures in Python have an index to position and locate the element.
Elements within the sequence can be accessed using the square brackets [].
It takes index of an element and returns the element itself. The syntax for
accessing a single element is as follows:
sequence[i]
The above statement will return the element from sequence at index i. We
can access multiple elements from the sequence using the syntax [start
index : end index] in the following manner:
sequence[si : ei]
The above statement will return values starting at index si up to but NOT
including the element at index ei. This operation is referred to as slicing.
For example:
Python also supports negative indexing to access elements from the se-
quence end and it starts with -1 as follows:
Index 0 1 2 3 4 5 6 7 8 9
Sequence A B C D E F G H I J
Negative Index -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
A sequence can also be sliced using the negative indexing. In order to access
the last element, we write
sequence[-1]
and it will return the element J. Similarly, a range can be provided to access
multiple elements.
66 | Chapter 5
sequence[-5:-1] will return elements from 'F' to 'I'
5.2 Array
An array can be thought of as a container that can hold a fixed number
of data values of the same type. Though the use of array is less popular in
Python as compared to other languages such as C and Java, most other data
structures internally make use of arrays to implement their algorithms. An
array consists of two components, viz Element and Index.
We can create an array by using the built-in array module. It can be created
as follows:
In []: arr
Out[]: array('i', [2, 4, 6, 8])
In []: type(arr)
Out[]: array.array
In the above example, we import the array method from the array mod-
ule and then initialize the variable arr with values 2, 4, 6, and 8 within the
square brackets. The i represents the data type of values. In this case, it
represents integer. Python array documentation1 provides more informa-
tion about the various type codes available in the Python.
1 https://docs.python.org/3.4/library/array.html
Array | 67
Index 0 1 2 3
Element 2 4 6 8
In []: arr[0]
Out[]: 2
We use insertion operation to insert one or more data elements into an array.
Based on the requirement, an element can be inserted at the beginning, end
or any given index using the insert() method.
68 | Chapter 5
In []: arr
Out[]: array('i', [20, 2, 4, 6, 8])
In []: arr
Out[]: array('i', [20, 2, 4, 60, 6, 8])
In []: arr
Out[]: array('i', [2, 4, 60, 6, 8])
In []: arr.remove(60)
In []: arr
Out[]: array('i', [2, 4, 6, 8])
We can update an element at the specific index using the assignment oper-
ator = in the following manner:
# Update an element at index 1
In []: arr[0] = 1
In []: arr
Out[]: array('i', [1, 4, 6, 8])
In []: arr
Out[]: array('i', [1, 4, 6, 7])
Array | 69
Though Python allows us to perform a wide variety of oper-
ations on arrays, the built-in array module is rarely used. In-
stead, in real world programming most programmers prefers to
use NumPy arrays provided by the NumPy library.
5.3 Tuples
In Python, tuples are part of the standard library. Like arrays, tuples also
hold multiple values within them separated by commas. In addition, it also
allows storing values of different types together. Tuples are immutable, and
usually, contain a heterogeneous sequence of elements that are accessed
via unpacking or indexing.
To create a tuple, we place all elements within brackets (). Unlike arrays,
we need not import any module for using tuples.
In []: tup
Out[]: (1, 2, 3)
In []: type(tupl)
Out[]: tuple
The tuple tupl created above can be visualized in the following manner:
Index 0 1 2
Element 1 ’a’ 2.5
70 | Chapter 5
A tuple can also be created without using the brackets.
In []: type(tup)
Out[]: tuple
In []: tupl
Out[]: (1, 1, 1, 1, 1)
Python throws an error if we try to access an element that does not exist.
In other words, if we use the slice operation with a non-existent index, we
will get an error.
In []: tup[3]
Traceback (most recent call last):
Tuples | 71
tup[3]
In the above example, we try to access an element with index 3 which does
not exist. Hence, Python threw an error stating index out of range.
The built-in len() function is used to check the length of a tuple.
In []: len(tup)
Out[]: 3
In []: len(tupl)
Out[]: 5
5.3.2 Immutability
In Python, tuple objects are immutable. That is, once they are created, it
cannot be modified. If we try to modify a tuple, Python will throw an error.
In []: tup[1] = 10
Traceback (most recent call last):
In []: t1 = (1, 2, 3)
In []: t2 = (4, 5)
72 | Chapter 5
In []: t1 + t2
Out[]: (1, 2, 3, 4, 5)
In []: t1 = (1, 2, 3)
In []: t1 += 4, 5
In []: t1
Out[]: (1, 2, 3, 4, 5)
In []: tup
Out[]: (1, 2, 3)
In []: x, y, z = tup
The above statement performs the unpacking operation. It will assign the
value 1 to the variable x, 2 to y, and 3 to z. This operation requires that
there are as many variables on the left hand side of the equal sign as there
are elements in the tuple.
In []: tup
Out[]: (1, 2, 3)
Tuples | 73
# Returns the index of value '3'.
In []: tup.index(3)
Out[]: 2
In []: tup.count(1)
Out[]: 5
Some of the reasons why tuples are useful are given below:
5.4 Lists
A list is a data structure that holds an ordered collection of items i.e. we
can store a sequence of items in a list. In Python, lists are created by placing
all items within square brackets [] separated by comma.
It can have any number of items and they may be of different data types
and can be created in the following manner:
# Empty list
In []: list_a = []
In []: list_a
Out[]: []
In []: list_b
Out[]: [1, 2, 3]
74 | Chapter 5
# List with mixed data types
In []: list_c =[1, 2.5, 'hello']
In []: list_c
Out[]: [1, 2.5, 'hello']
A list can also have another list as an item. This is called nested list.
In []: stock_list
Out[]: ['HP', 'GOOG', 'TSLA', 'MSFT', 'AAPL', 'AMZN',
'NFLX']
Lists | 75
# Updating the first element
In []: stock_list[0] = 'NVDA'
In []: stock_list
Out[]: ['NVDA', 'GOOG', 'TSLA', 'AMD', 'GE', 'BAC']
In []: stock_list
Out[]: ['HP', 'GOOG', 'MSFT']
In []: stock_list.append('AMZN')
In []: stock_list
Out[]: ['HP', 'GOOG', 'MSFT', 'AMZN']
In the above example, we add new element using the append() method.
Let’s add multiple elements to the list. In Python, whenever we are to add
multiple literal to any object, we enclose it within list i.e. using [] the square
brackets. The output that we expect is the appended list will all the new
elements.
In []: stock_list
Out[]: ['HP', 'GOOG', 'MSFT', 'AMZN', ['TSLA', 'GE',
'NFLX']]
76 | Chapter 5
The output we got is not as per our expectation. Python amended the new
element as a single element to the stock_list instead of appending three
different elements. Python provides the extend() method to achieve this.
In []: stock_list
Out[]: ['HP', 'GOOG', 'MSFT', 'AMZN']
In []: stock_list
Out[]: ['HP', 'GOOG', 'MSFT', 'AMZN', 'TSLA', 'GE',
'NFLX']
In []: stock_list
Out[]: ['HP', 'AAPL', 'GOOG', 'MSFT', 'AMZN', 'TSLA', 'GE',
'NFLX']
Lists | 77
# Removing the element 'AAPL'
In []: stock_list.remove('AAPL')
In []: stock_list
Out[]: ['HP', 'GOOG', 'MSFT', 'AMZN', 'TSLA', 'GE',
'NFLX']
• pop() : This function removes and returns the last item in the list. If
we provide the index as an argument, it removes the item at the given
position in the list and returns it. It is optional to provide an argument
here.
In []: stock_list
Out[]: ['HP', 'GOOG', 'MSFT', 'AMZN', 'TSLA', 'GE']
In []: stock_list
Out[]: ['HP', 'GOOG', 'AMZN', 'TSLA', 'GE']
78 | Chapter 5
• index(element) : Returns the index of the first item whose value is
element provided in an argument. Python will throw an error if there
is no such item.
In []: stock_list.index('GOOG')
Out[]: 1
In []: stock_list.index('GE')
Out[]: 4
In []: stock_list
Out[]: ['HP', 'GOOG', 'AMZN', 'TSLA', 'GE', 'GOOG']
• sort() : When called, this method returns the sorted list. The sort
operation will be in place.
In []: stock_list
Out[]: ['AMZN', 'GE', 'GOOG', 'GOOG', 'HP', 'TSLA']
• reverse() : This method reverses the elements of the list and the op-
eration performed will be in place.
Lists | 79
# Reversing the elements within the list.
In []: stock_list.reverse()
In []: stock_list
Out[]: ['TSLA', 'HP', 'GOOG', 'GOOG', 'GE', 'AMZN']
In []: stack
Out[]: [1, 5, 6, 4, 5]
Another data structure that can be built using list methods is queue, where
80 | Chapter 5
the first element added is the first element retrieved, also known as First
In, First Out (FIFO). Consider a queue at a ticket counter where people are
catered according to their arrival sequence and hence the first person to
arrive is also the first to leave.
It can be created using the append() and popleft() methods. For example,
Lists | 81
5.5 Dictionaries
A Python dictionary is an unordered collection of items. It stores data
in key-value pairs. A dictionary is like a phone-book where we can find
the phone numbers or contact details of a person by knowing only
his/her name i.e. we associate names (keys) with corresponding de-
tails (values). Note that the keys must be unique just like the way it is
in a phone book i.e. we cannot have two persons with the exact same name.
In a dictionary, pairs of keys and values are specified within curly brackets
{} using the following notation:
dictionary = {key1 : value1, key2 : value2, key3 : value3}
Notice that the key-value pairs are separated by the colon : and pairs them-
selves are separated by ,. Also, we can use only immutable objects like
strings and tuples for the keys of a dictionary. Values of a dictionary can
be either mutable or immutable objects. Dictionaries that we create are in-
stances of the dict class and they are unordered, so the order that keys are
added doesn’t necessarily reflect the same order when they are retrieved
back.
In []: type(tickers)
Out[]: dict
In []: type(tickers)
Out[]: dict
82 | Chapter 5
In []: tickers = {'GOOG' : 'Alphabet Inc.',
...: 'AAPL' : 'Apple Inc.',
...: 'MSFT' : 'Microsoft Corporation'}
In []: tickers
Out[]:
{'GOOG': 'Alphabet Inc.',
'AAPL': 'Apple Inc.',
'MSFT': 'Microsoft Corporation'}
Keys in a dictionary should be unique. If we supply the same key for mul-
tiple pairs, Python will ignore the previous value associated with the key
and only the recent value will be stored. Consider the following example:
In []: same_keys
Out[]: {'symbol': 'GOOG'}
Dictionaries | 83
In the above example, Python discarded the value AAPL and retained the
latest value assigned to the same key. Once we have created dictionaries,
we can access them with the help of the respective keys. We use the slice op-
erator [] to access the values; however, we supply a key to obtain its value.
With the dictionaries created above, we can access values in the following
manner:
In []: ticker
Out[]:
{'symbol': 'AAPL',
'price': 224.95,
'company': 'Apple Inc',
'founded': 1976,
'products': ['Machintosh', 'iPod', 'iPhone', 'iPad']}
In []: tickers
Out[]:
{'AAPL': {'name': 'Apple Inc.', 'price': 224.95},
'GOOG': {'name': 'Alphabet Inc.', 'price': 1194.64}}
84 | Chapter 5
Out[]: {'name': 'Apple Inc.', 'price': 224.95}
In []: ticker['price']
Out[]: 224.95
In []: ticker['price']
Out[]: 226
A new key-value pair can also be added in a similar fashion. To add a new
element, we write the new key inside the square brackets [] and assign a
new value. For example:
In []: ticker
Out[]:
{'symbol': 'AAPL',
'price': 226,
'company': 'Apple Inc',
'founded': 1976,
'products': ['Machintosh', 'iPod', 'iPhone', 'iPad'],
'founders': ['Steve Jobs', 'Steve Wozniak',
'Ronald Wayne']}
In the above example, we add the key founders and assign the list ['Steve
Jobs', 'Steve Wozniak', 'Ronald Wayne'] as value. If we are to delete
Dictionaries | 85
any key-value pair in the dictionary, we use the built-in del() function as
follows:
In []: del(ticker['founders'])
In []: ticker
Out[]:
{'symbol': 'AAPL',
'price': 226,
'company': 'Apple Inc',
'founded': 1976,
'products': ['Machintosh', 'iPod', 'iPhone', 'iPad']}
In []: len(ticker)
Out[]: 5
In []: len(tickers)
Out[]: 2
Now we discuss some of the popular methods provided by the dict class.
In []: ticker.items()
Out[]: dict_items([('symbol', 'AAPL'), ('price', 226),
('company', 'Apple Inc'),
('founded', 1976),
('products', ['Machintosh', 'iPod',
'iPhone', 'iPad']])
86 | Chapter 5
In []: ticker.keys()
Out[]: dict_keys(['symbol', 'price', 'company', 'founded',
'products'])
In []: ticker.values()
Out[]: dict_values(['AAPL', 224.95, 'Apple Inc', 1976,
['Machintosh', 'iPod', 'iPhone',
'iPad']])
• pop() : This method pops the item whose key is given as an argument.
In []: tickers
Out[]:
{'GOOG': 'Alphabet Inc.',
'AAPL': 'Apple Inc.',
'MSFT': 'Microsoft Corporation'}
In []: tickers.pop('GOOG')
Out[]: 'Alphabet Inc.'
In []: tickers
Out[]: {'AAPL': 'Apple Inc.',
'MSFT': 'Microsoft Corporation'}
• copy() : As the name suggests, this method copies the calling dictio-
nary to another dictionary.
In []: aapl
Out[]:
{'symbol': 'AAPL',
'price': 224.95,
'company': 'Apple Inc',
'founded': 1976,
'products': ['Machintosh', 'iPod', 'iPhone', 'iPad']}
Dictionaries | 87
In []: ticker.clear()
In []: ticker
Out[]: {}
• update() : This method allows to add new key-pair value from an-
other dictionary.
In []: new_tickers = {}
In []: new_tickers.update(ticker1)
In []: new_tickers.update(ticker2)
In []: new_tickers
Out[]: {'NFLX': 'Netflix', 'AMZN': 'Amazon'}
5.6 Sets
A set is an unordered and unindexed collection of items. It is a collection
data type which is mutable, iterable and contains no duplicate values. A
set in Python represents the mathematical notion of a set.
In Python sets are written using the curly brackets in the following way:
In []: universe
Out[]: {'AAPL', 'GE', 'GOOG', 'NFLX'}
88 | Chapter 5
In []: universe.add('AMZN')
In []: universe
Out[]: {'AAPL', 'AMZN', 'GE', 'GOOG', 'NFLX'}
Python won’t add the same item again nor will it throw any error.
In []: universe.add('AMZN')
In []: universe.add('GOOG')
In []: universe
Out[]: {'AAPL', 'AMZN', 'GE', 'GOOG', 'NFLX'}
In order to add multiple items, we use the update() method with new items
to be added within a list.
In []: universe
Out[]: {'AAPL', 'AMZN', 'FB', 'GE', 'GOOG', 'NFLX', 'TSLA'}
We can use the inbuilt len() function to determine the length of a set.
In []: len(universe)
Out[]: 7
In []: universe.remove('FB')
In []: universe.discard('TSLA')
In []: universe
Out[]: {'AAPL', 'AMZN', 'GE', 'GOOG', 'NFLX'}
If we try to remove an item using the remove() which is not present in the
set, Python will throw an error.
Sets | 89
In []: universe.remove('FB')
Traceback (most recent call last):
KeyError: 'FB'
The discard() method will not throw any error if we try to discard an item
which is not present in the set.
In []: universe
Out[]: {'AAPL', 'AMZN', 'GE', 'GOOG', 'NFLX'}
In []: universe.discard('FB')
In []: universe.clear()
In []: universe
Out[]: set()
90 | Chapter 5
In []: universe
Out[]: {'AAPL', 'AMD', 'BAC', 'BMO', 'GOOG', 'JPLS', 'WDC'}
Sets | 91
Out[]: True
92 | Chapter 5
perform basic array operations.
6. A tuple can hold multiple values of different types within it separated
by commas. Tuples are enclosed within parentheses () and they are
immutable.
7. A list holds an ordered collection of items. Lists are created by plac-
ing all items within square brackets [] separated by a comma. They
can also be used to implement other data structures like stacks and
queues.
8. A list can also have another list as an item. This is called a nested list.
9. Dictionary stores data in the form of a key-value pair. It can be created
using the curly brackets {}. Element/Pair within the dictionary is
accessed using the corresponding keys instead of an index.
10. Sets are an unordered data structure created using the curly brackets
{}. It cannot contain duplicate elements.
Key Takeaways | 93
94 | Chapter 5
Chapter 6
95
• The as keyword is used to create an alias. Consider the following
example where we create an alias for the calendar module as c while
importing it. Once aliased we can refer to the imported module with
its alias.
In []: c.isleap(2019)
Out[]: False
AssertionError
• The break keyword is used to break a for loop and while loop.
# Output
0
1
2
3
96 | Chapter 6
• The class keyword is used to create a class.
In []: s1 = stock()
In []: s1.name
Out[]: 'AAPL'
In []: s1.price
Out[]: 224.61
Python Keywords | 97
In []: def python_function():
...: print('Hello! This is a Python Function.')
In []: python_function()
Out[]: Hello! This is a Python Function.
• The del keyword is used to delete objects. It can also be used to delete
variables, lists, or elements from data structures, etc.
• The else keyword is used to define the code block that will be exe-
cuted when all if conditions above it fail. It does not check for any
condition, it just executes the code if all the conditions above it fail.
In []: number = 5
...:
...: if number < 5:
98 | Chapter 6
...: print('Number is less than 5')
...: elif number == 5:
...: print('Number is equal to 5')
...: else:
...: print('Number is greater than 5')
...:
# Output
Number is equal to 5
• The try keyword is used to define the try...except code block which
will be followed by the code block defined by except keyword.
Python tries to execute the code within try block and if it executes
successfully, it ignores subsequent blocks.
# Output
Something went wrong.
Python Keywords | 99
...: print('Something went wrong.')
...: finally:
...: print('The try...except code is finished')
...:
# Output
Something went wrong.
The try...except code is finished
In []: int(False)
Out[]: 0
• The True keyword is used to represent the boolean result true. It eval-
uates to 1 when cast to an integer value.
In []: int(True)
Out[]: 1
# Output
0
1
2
3
4
100 | Chapter 6
• The import keyword is used to import external libraries and modules
in to current program code.
# Output
MSFT
• The is keyword is used to test if two variables refers to the same ob-
ject in Python. It returns true if two variables are same objects, false
otherwise.
In []: stock_list = ['GOOG', 'MSFT', 'NFLX', 'TSLA']
In []: y = stock_list
In []: y is stock_list
Out[]: True
In []: y
Out[]: ['TSLA', 'NFLX', 'MSFT', 'GOOG']
In []: addition(5, 2)
Out[]: 7
102 | Chapter 6
• The None keyword is used to define a null value, or no value at all. It
is not same as 0, False, or an empty string. None is represented by a
datatype of NoneType.
In []: x = None
In []: type(x)
Out[]: NoneType
# Output
GOOG
In []: x = 'Python'
104 | Chapter 6
raise TypeError('Only integers are allowed')
In []: addition(2, 3)
Out[]: 5
• The with keyword is used to wrap the execution of a block with meth-
ods defined by a context manager1 . It simplifies exception handling
by encapsulating common preparation and cleanup tasks. For exam-
ple, the open() function is a context manager in itself, which allows
opening a file, keeping it open as long as the execution is in context
of the with, and closing it as soon as we leave the context. So simply
put, some resources are acquired by the with statement and released
when we leave the with context.
6.2 Operators
Operators are constructs or special symbols which can manipulate or com-
pute the values of operands in an expression. In other words, they are used
to perform operations on variables and values. Python provides a bunch
of different operators to perform a variety of operations. They are broadly
categorized into the following:
In []: 5 + 3
Out[]: 8
In []: 5 - 2
Out[]: 3
In []: 5 * 2
Out[]: 10
In []: 10 / 2
Out[]: 5.0
106 | Chapter 6
• % : This is a modulus operator. It returns the remainder of the division
operation.
In []: 16 % 5
Out[]: 1
In []: 3 ** 2
Out[]: 9
In []: b = 3
In []: x = 5
In []: y = 8
Operators | 107
• == : This is an equal to operator used to check whether two values are
equal or not. It returns true if values are equal, false otherwise.
In []: a == x
Out[]: True
In []: a == b
Out[]: False
In []: a != x
Out[]: False
In []: a != b
Out[]: True
• > : This is a greater than operator used to check whether one value is
greater than another value. It returns true if the first value is greater
compared to the latter, false otherwise.
In []: y > x
Out[]: True
In []: b > y
Out[]: False
• < : This is a less than operator used to check whether one value is less
than another value. It returns true if the first value is less compared
to the latter, false otherwise.
In []: y < x
Out[]: False
In []: b < y
Out[]: True
108 | Chapter 6
• >= : This is a greater than or equal to operator used to check whether
one value is greater than or equal to another value or not. It returns
true if the first value is either greater than or equal to the latter value,
false otherwise.
In []: a >= x
Out[]: True
In []: y >= a
Out[]: True
In []: b >= x
Out[]: False
• <= : This is a less than or equal to operator used to check whether one
value is less than or equal to another value or not. It returns true
if the first value is either less than or equal to the latter value, false
otherwise.
In []: a <= x
Out[]: True
In []: y <= a
Out[]: False
In []: b <= x
Out[]: True
Operators | 109
In []: 8 >= 8 and 5 < 5
Out[]: False
In []: 5 == 5 or 3 > 5
Out[]: True
• not : This operator reverses the result. It returns true if the result is
false, and vice versa.
In []: 3 == 3
Out[]: True
In []: not 3 == 3
Out[]: False
In []: 3 != 3
Out[]: False
In []: not 3 != 3
Out[]: True
110 | Chapter 6
number instead of a number. Binary numbers are represented by a combi-
nation of 0 and 1. For better understanding, we define following numbers
(integers) and their corresponding binary numbers.
Number Binary
201 1100 1001
15 0000 1111
In the above example, both 201 and 15 are represented by 8 bits. Bitwise
operators work on multi-bit values, but conceptually one bit at a time. In
other words, these operator works on 0 and 1 representation of underlying
numbers.
• & : This is a bitwise AND operator that returns 1 only if both of its
inputs are 1, 0 otherwise. Below is the truth table for the & operator
with four bits.
Bits 1234
Input 1 0011
Input 2 0101
& Output 0001
We can compute the bitwise & operation between 201 and 15 as fol-
lows:
In []: 201 & 15
Out[]: 9
Let us understand with the help of a truth table, how Python returned
the value 9.
Binary Numbers
Input 1 201 1100 1001
Input 2 15 0000 1111
& Output 9 0000 1001
Python evaluated & operation based on each bit of inputs and re-
Operators | 111
turned an integer equivalent of the binary output. In the above ex-
ample, decimal equivalent of 0000 1001 is 9.
Bits 1234
Input 1 0011
Input 2 0101
| Output 0111
Binary Numbers
Input 1 201 1100 1001
Input 2 15 0000 1111
Output 207 1100 1111
• : This is a bitwise XOR operator that returns 1 only if any one of its
input is 1, 0 otherwise. Below is the truth table for the XOR operation.
Bits 1234
Input 1 0011
Input 2 0101
Output 0110
Notice that it does not return 1 if all inputs are 1. The bitwise can be
112 | Chapter 6
performed as follows:
In []: 201 ^ 15
Out[]: 198
The output returned by the Python can be verified via its correspond-
ing truth table as shown below.
Binary Numbers
Input 1 201 1100 1001
Input 2 15 0000 1111
ˆ Output 207 1100 0110
Bits 12
Input 01
Output 10
• << : This is a bitwise left shift operator. It takes two inputs: number to
operate on and number of bits to shift. It shifts bits to the left by pushing
zeros in from the right and let the leftmost bits fall off. Consider the
following example:
In []: 15 << 2
Out[]: 60
In the above example, we are shifting the number 15 left by 2 bits. The
first input refers to the number to operate on and the second input
refers to the number of bits of shift. We compute the truth table for
the above operation as below:
Operators | 113
Binary
Input 15 0000 1111
<< Output 60 0011 1100
• >> : Similar to the left shift operator, we have a shift right operator
that shifts bits right and fills zero on the left. While shifting bits to
right, it let the rightmost bits fall off and add new zeros to the left.
In []: 201 >> 2
Out[]: 50
In the above example, the number 201 gets shifted right by 2 bits and
we get 50 as an output. Its integrity can be verified by the following
truth table.
Binary Numbers
Input 201 1100 1001
>> Output 50 0011 0010
• = : This operator assigns the value on its right side to the operand on
its left.
In []: a = 5
In []: b = 3
114 | Chapter 6
We can also use this operator to assign multiple values to multiple
operands on the left side. Number of values and operands must be
same on both sides, else Python will throw an error.
In []: a, b = 5, 3
• += : This operator adds the operand on the right side with the operand
on the left side and assigns the result back to the same operand on the
left side.
In []: a += 2
In []: print(a)
Out[]: 7
• -= : This operator subtracts the operand on the right side with the
operand on the left side and assigns the result back to the same
operand on the left side.
In []: a -= 2
In []: print(a)
Out[]: 5
Operators | 115
In []: a *= 2
In []: print(a)
Out[]: 10
• /= : This operator divides the operand on the left side by the operand
on the right side and assigns the result back to the same operand on
the left side.
In []: a /= 3
In []: print(a)
Out[]: 3.3333333333333335
In []: a = 10
In []: a %= 3
In []: print(a)
Out[]: 1
In []: a **= 3
In []: print(a)
Out[]: 8
• //= : This operator divides the left operand with the right operand
and then assigns the result (floored to immediate integer) to the
operand on the left.
In []: a = 10
In []: a //= 4
In []: print(a)
Out[]: 2
116 | Chapter 6
• &= : This operator performs the bitwise ’AND’ operation between the
operands and then assigns the result to the operand on the left side.
In []: a = 0
In []: a &= 1
In []: a = 0
In []: a |= 1
In []: a = 1
In []: a ^= 1
• >>= : This operator shifts bits of the left operand to the right specified
by the right operand and then assigns the new value to the operand
on the left side.
In []: a = 32
In []: a >>= 2
In []: print(a)
Out[]: 8
Operators | 117
• <<= : This operator shifts bits of the left operand to the left specified
by the left operand and then assigns the new value to the operand on
the left side.
In []: a = 8
In []: a <<= 2
In []: print(a)
Out[]: 32
• not in : This operator returns True if a value does not exists in a se-
quence, False otherwise.
118 | Chapter 6
• is : This operator returns True if both operands are identical, False
otherwise.
In []: a = 3
In []: b = a
In []: x = 3
• is not : This operator returns True if both operands are not on same
memory location, False otherwise.
Operators | 119
In []: 2 + 5 - 3 + 1
Out[]: 5
In the above example, Python will first evaluate 2 + 5 resulting into 7, then
subtracts 3 from it to get 4, and finally adding 1 to obtain the final result of
5. But this is not the case always. If we include more operators, Python will
behave in the different manner. For example,
In []: 2 + 5 * 3
Out[]: 17
Operators Precedence
() Parentheses
** Exponential
+, -, ~ Positive, Negative, Bitwise NOT
*, /, //, % Multiplication, Division,
Floor Division, Modulus
+, - Addition, Subtraction
, Bitwise Left, Bitwise Right
& Bitwise AND
Bitwise XOR
| Bitwise OR
==, !=, >, >=, <, <=, Comparison, Identity,
is, is not, in, not in Membership Operators
not Logical NOT
and Logical AND
or Logical OR
120 | Chapter 6
As the above table lists the () with the highest precedence, it can be used
to change the precedence of any operator to be highest. Any expression
written inside the parentheses () gets highest precedence and evaluated
first.
In []: (5 / 2) * (2 + 5)
Out[]: 17.5
122 | Chapter 6
Chapter 7
The code we write gets executed in the order they are written. In other
words, a program’s control flow is the order in which the program’s code
executes. Using conditional statements such as if statements, and loops,
we can define or alter the execution order of the code. This section covers
a conditional if statement and for and while loops; functions are covered
in the upcoming section. Raising and handling exceptions also affects the
control flow which will be discussed in subsequent sections.
123
Let us consider a scenario where we want to go long on a stock if
buy_condition is True.
# Input
if buy_condition_1 == True and rsi_indicator <= 20:
position = 'Buy'
124 | Chapter 7
Similar to the above scenario, we can compound the if condition to be as
complex as we want it to be, using different combinations of logical opera-
tors.
# Input
if buy_condition_1 == True and rsi_indicator <= 20:
position = 'Buy'
elif sell_condition_1 and rsi_indicator >= 80:
position = 'Sell'
During the execution, the interpreter will first check whether the conditions
listed by the if statement holds true or not. If they are true, the code within
the if block will be executed. Otherwise, the interpreter will try to check
the conditions listed by the elif statement and if they are true, the code
within the elif block will be executed. And if they are false, the interpreter
will execute the code following the elif block. It is also possible to have
multiple elif blocks, and the interpreter will keep on checking the con-
ditions listed by each elif clause and executes the code block wherever
conditions will be held true.
# Input
if buy_condition_1 == True and rsi_indicator <= 20:
position = 'Buy'
elif sell_condition_1 and rsi_indicator >= 80:
In the above example, if the conditions listed by the if and elif clauses
are false, the code within the else block gets executed and the variable
position will be assigned a value 'None'.
7.2 Loops
Let us consider a scenario where we want to compare the value of the
variable rsi_indicator multiple times. To address this situation, we
need to update the variable each time manually and check it with the if
statement. We keep repeating this until we check all the values that we
are interested in. Another approach we can use is to write multiple if
conditions for checking multiple values. The first approach is botched and
cumbersome, whereas the latter is practically non-feasible.
The approach we are left with is to have a range of values that need to be
logically compared, check each value and keep iterating over them. Python
allows us to implement such approach using loops or more precisely the
while and for statements.
126 | Chapter 7
# Input
data_points = 6
count = 0
# Output
0
1
2
3
4
5
When the above code is run, the interpreter will first check the conditional
expression laid by the while loop. If the expression is false and the condi-
tion is not met, it will enter the loop and executes the code statements within
the loop. The interpreter will keep executing the code within the loop until
the condition becomes true. Once the condition is true, the interpreter will
stop executing the code within the loop and move to the next code state-
ment. A while statement can have an optional else clause. Continuing the
above example, we can add the else clause as shown in the below example:
# Input
data_points = 6
count = 0
Loops | 127
In the above example, the interpreter will execute the while loop as we
discussed above. Additionally, when the condition becomes true, the inter-
preter will execute the else clause also and the output will be as follows:
# Output
0
1
2
3
4
5
The while loop is over.
The for statement is also known as for..in loop in Python. The item in
the above syntax is the placeholder for each item in the sequence.
128 | Chapter 7
The range() function returns a sequence of numbers, starting from 0 by
default, and increments by 1 (by default), and ends at a specified number.
The syntax of range() is as follows:
Parameter Values:-
start : Optional. An integer specifying at which number to
start. The default is 0.
stop : Required. An integer specifying at which number to end.
step : Optional. An integer specifying the incrementation.
The default is 1.
The range() function can be used along with the for loop as follows:
# Input
for i in range(5):
print(i)
Here, we have provided only stop parameter value as 5. Hence, the range()
function will start with 0 and end at 5 providing us with a sequence of 5
numbers. The output of the above for loop will be the following:
# Output
0
1
2
3
4
In the above for loop, the variable i will take the value of 0 generated
by the range() function for the first iteration and execute the code block
following it. For the second iteration, the variable i will take the value of
1 and again execute the code block following it and such repetition will
continue until the last value is yielded by the range() function.
It is also possible to use various combinations of the start, stop and step pa-
rameters in a range() function to generate any sort of sequence. Consider
the following example:
Loops | 129
# Input
for i in range(1, 10, 2):
print(i)
The above range() function will generate the sequence starting from 1 up
to 10 with an increment of 2 and the output will be the following:
# Output
1
3
5
7
9
# Input
top_gainers = ['BHARTIARTL', 'EICHERMOT', 'HCLTECH',
'BAJFINANCE', 'RELIANCE']
Here the for loop will iterate over the list top_gainers and it will print each
item within it along with their corresponding index number. The output of
the above for loop is shown below:
# Output
0 : BHARTIARTL
1 : EICHERMOT
2 : HCLTECH
3 : BAJFINANCE
4 : RELIANCE
130 | Chapter 7
7.2.5 Looping through strings
Strings in Python are iterable objects. In other words, strings are a sequence
of characters. Hence, we can use a string as a sequence object in the for
loop.
volume_name = 'Python'
We initialize the string volume_name with the value ’Python’ and provide
it as an iterable object to the for loop. The for loop yields each character
from the it and prints the respective character using the print statement.
The output is shown below:
# Output
P
y
t
h
o
n
dict = {'AAPL':193.53,
'HP':24.16,
'MSFT':108.29,
'GOOG':1061.49}
Loops | 131
If we execute the command dict.items() directly, Python will return us a
collection of a dictionary items (in form of tuples). as shown below:
# Input
dict.items()
# Output
dict_items([('AAPL', 193.53), ('HP', 24.16),
('MSFT', 108.29), ('GOOG', 1061.49)])
As we are iterating over tuples, we need to fetch a key and value for each
item in the for loop. We fetch the key and value of each item yielded by
the dict.items() method in the key and value variables and the output is
shown below:
# Output
Price of AAPL is 193.53
Price of HP is 24.16
Price of MSFT is 108.29
Price of GOOG is 1061.49
A for loop can also have an optional else statement which gets executed
once the for loop completes iterating over all items in a sequence. Sample
for loop with an optional else statement is shown below:
The above for loop prints five statements and once it completes iterating
over the range() function, it will execute the else clause and the output
will be the following:
# Output
This is 1.
132 | Chapter 7
This is 2.
This is 3.
This is 4.
This is 5.
For loop is over!
The first for loop defines the range for table from 1 to 9. Similarly, the
second or the inner for loop defines the multiplier value from 1 to 10. The
print() in the inner loop has parameter end=' ' which appends a space in-
stead of default new line. Hence, answers for a particular table will appear
in a single row. The output for the above nested loops is shown below:
# Output
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
For loop is over!
Loops | 133
The same scenario can also be implemented using the while nested loops
as given below and we will get the same output shown above.
# Input
table_value = 1
In Python it is also possible to nest different loops together. That is, we can
nest a for loop inside a while loop and vice versa.
# Input
for item in range(1,10):
134 | Chapter 7
print(f'This is {item}.')
if item == 6:
print('Exiting FOR loop.')
break
print('Not in FOR loop.')
We define a for loop that iterates over a range of 1 to 9 in the above exam-
ple. Python will try to execute the code block following the loop definition,
where it will check if the item under consideration is 6. If true, the inter-
preter will break and exit the loop as soon as it encounters the break state-
ment and starts executing the statement following the loop. The output of
the above loop will be the following:
# Output
This is 1.
This is 2.
This is 3.
This is 4.
This is 5.
This is 6.
Exiting FOR loop.
Not in FOR loop.
# Output
This is 1.
This is 2.
This is 3.
This is 4.
This is 5.
This is 6.
This is 7.
This is 8.
This is 9.
Not in FOR loop.
As seen in the output above, the interpreter didn’t print anything once it
encountered the continue keyword thereby skipping the iteration.
In Python, loops cannot have an empty body. Suppose we have a loop that
is not implemented yet, but we want to implement it in the future, we can
use the pass statement to construct a body that does nothing.
# Input
stocks = ['AAPL', 'HP', 'MSFT', 'GOOG']
136 | Chapter 7
In the loop defined above, Python will just iterate over each item without
producing any output and finally execute the else clause. The output will
be as shown below:
# Output
For loop is over!
The output of the above set notation will be cubes of all natural numbers
less than 10. Now let’s look at the corresponding Python code implement-
ing list comprehension.
As we see in the Python code above, list comprehension starts and ends
with square brackets to help us remember that the output will be a list. If
we look closely, it is a for loop embedded in the square bracket. In a general
sense, a for loop works as follows:
As shown above, the syntax for list comprehension starts with the opening
square bracket [ followed by output expression, for loop, and optional
if condition. It has to be ended with the closing square bracket ].
The set defined above can also be implemented using the for loop in the
following way:
for i in range(0,10):
cube_list.append(i**3)
# Input
[i**3 for i in range(0,10)]
The output for the for loop and the list comprehension defined above will
be the same shown below:
# Output
[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]
We can filter the output produced by a list comprehension using the condi-
tion part in its construct. Consider the revised set notation given below:
The set defined above contains cubes of all whole numbers which are less
than 20 and even. It can be implemented using the for loop as given below:
# Input
cube_list = []
for i in range(1,20):
if i%2==0:
cube_list.append(i**3)
print(cube_list)
# Output
[8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832]
The output we got is in line with the set defined above and the for loop
defined above can be implemented in a single line using the LC construct.
138 | Chapter 7
# Input
[i**3 for i in range(1,20) if i%2==0]
# Input
[i for i in range(0,20) if i%2==0 if i%3==0]
#Output
[0, 6, 12, 18]
# Input
[str(i)+': Even' if i%2==0 else str(i)+': Odd' for i in
range(0,6)]
In such a scenario, we need to put the if and else part of a condition before
the for loop in the comprehension. The output of the above construct is as
below:
# Output
['0: Even', '1: Odd', '2: Even', '3: Odd', '4: Even',
'5: Odd']
# Input
for i in range(7,8):
for j in range(1,11):
print(f'{i} * {j} = {i * j}')
# Input
[i * j for j in range(1,11) for i in range(7,8)]
Here, the output will only be the result of multiplication as shown below:
# Output
[7, 14, 21, 28, 35, 42, 49, 56, 63, 70]
140 | Chapter 7
4. The elif statement can be used in case of multiple mutually exclusive
conditions.
5. The else statement can be used when code needs to be executed if all
previous conditions fail.
6. Loops are used to perform an iterative process. In other words, loops
are used to execute the same code more than one time.
7. A loop can be implemented using: a while statement and a for state-
ment.
8. A counter needs to be coded explicitly for a while loop, else it might
run infinitely.
9. The range() function is used to generate sequences in Python.
10. A loop within a loop is known as a nested loop.
11. A for loop is used to iterate over data structures such as lists, tuples,
dictionaries and string as well.
12. The break keyword is used to break the execution of a loop and di-
rects the execution flow outside the loop.
13. The continue keyword is used to skip the current iteration of a loop
and moves the execution flow to the next iteration.
14. In Python, loops cannot have an empty body.
15. The pass keyword is used as a placeholder in an empty loop.
16. A list comprehension returns list. It consists of square brackets con-
taining an expression that gets executed for each element in the itera-
tion over a loop.
In this section we will explore the natural world of iterators, objects that we
have already encountered in the context of for loops without necessarily
knowing it, followed by its easier implementation via a handy concept of
generators. Let’s begin.
8.1 Iterators
Iterators are everywhere in Python. They are elegantly implemented in for
loop, comprehensions, etc. but they are simply hidden in plain sight. An
iterator is an object that can be iterated upon and which will return data,
one element at a time. It allows us to traverse through all elements of a
collection, regardless of its specific implementation.
8.1.1 Iterables
An iterable is an object, not necessarily a data structure that can return an
iterator. Its primary purpose is to return all of its elements. An object is
known as iterable if we can get an iterator from it. Directly or indirectly it
will define two methods:
143
• __iter__() method which returns the iterator object itself and is used
while using the for and in keywords.
• __next__() method returns the next value. It also returns
StopIteration error once all the objects have been traversed.
The Python Standard Library contains many iterables: lists, tuples, strings,
dictionaries and even files and we can run a loop over them. It essentially
means we have indirectly used the iterator in the previous section while
implementing looping techniques.
All these objects have an iter() method which is used to get an iterator.
Below code snippet returns an iterator from a tuple, and prints each value:
In []: next(iterator)
Out[]: 'AAPL'
In []: next(iterator)
Out[]: 'MSFT'
In []: iterator.__next__()
Out[]: 'AMZN'
In []: next(iterator)
Traceback (most recent call last):
StopIteration
We use the next() function to iterate manually through all the items of
an iterator. Also, the next() function will implicitly call the __next__()
method of an iterator as seen in the above example. It will raise
StopIteration error once we reach the end and there is no more data to be
returned.
144 | Chapter 8
We can iterate manually through other iterables like strings and list, in
the manner similar to one we used to iterate over the tuple int the above
example. The more elegant and automated way is to use a for loop. The
for loop actually creates an iterator object and executes the next() method
for each loop.
We are now going to dive a bit deeper into the world of iterators and iter-
ables by looking at some handy functions viz. the enumerate(), zip() and
unzip() functions.
In []: en_object
Out[]: <enumerate at 0x7833948>
In []: list(en_object)
Out[]: [(0, 'AAPL'), (1, 'MSFT'), (2, 'TSLA')]
The enumerate object itself is also iterable, and we can loop over while un-
packing its elements using the following clause.
In []: for index, value in enumerate(stocks):
...: print(index, value)
0 AAPL
1 MSFT
2 TSLA
It is the default behaviour to start an index with 0. We can alter this be-
haviour using the start parameter within the enumerate() function.
Iterators | 145
In []: for index, value in enumerate(stocks, start=10):
...: print(index, value)
10 AAPL
11 MSFT
12 TSLA
In []: print(type(z))
<class 'zip'>
Here, we have two lists company_names and tickers. Zipping them to-
gether creates a zip object which can be then converted to list and looped
over.
In []: z_list
Out[]: [('Apple', 'AAPL'), ('Microsoft', 'MSFT'),
('Tesla', 'TSLA')]
The first element of the z_list is a tuple which contains the first element
of each list that was zipped. The second element in each tuple contains the
corresponding element of each list that was zipped and so on. Alternatively,
we could use a for() loop to iterate over a zip object print the tuples.
146 | Chapter 8
AAPL = Apple
MSFT = Microsoft
TSLA = Tesla
We could also have used the splat operator(*) to print all the elements.
In []: print(*z)
('Apple', 'AAPL') ('Microsoft', 'MSFT') ('Tesla', 'TSLA')
class Counter(object):
def __init__(self, start, end):
"""Initialize the object"""
self.current = start
self.end = end
def __iter__(self):
"""Returns itself as an iterator object"""
return self
def __next__(self):
"""Returns the next element in the series"""
if self.current > self.end:
raise StopIteration
else:
self.current += 1
return self.current -1
We created a Counter class which takes two arguments start (depicts the
start of a counter) and end (the end of the counter). The __init__() method
is a constructor method which initializes the object with the start and end
Iterators | 147
parameters received. The __iter__() method returns the iterator object
and the __next__() method computes the next element within the series
and returns it. Now we can use the above-defined iterator in our code as
shown below:
# Run a loop over the newly created object and print its
# values
for element in counter:
print(element)
# Output
1
2
3
4
5
Remember that an iterator object can be used only once. It means once
we have traversed through all elements of an iterator, and it has raised
StopIteration, it will keep raising the same exception. So, if we run the
above for loop again, Python will not provide us with any output. Inter-
nally it will keep raising the StopIteration error. This can be verified using
the next() method.
In []: next(counter)
Traceback (most recent call last):
StopIteration
148 | Chapter 8
8.2 Generators
Python generator gives us an easier way to create iterators. But before we
make an attempt to learn what generators in Python are, let us recall the list
comprehension we learned in the previous section. To create a list of the
first 10 even digits, we can use the comprehension as shown below:
But what are actually generator objects? Well, a generator object is like
list comprehension except it does not store the list in memory; it does not
construct the list but is an object we can iterate over to produce elements of
the list as required. For example:
In []: type(numbers)
Out[]: generator
Generators | 149
Here we can see that looping over a generator object produces the elements
of the analogous list. We can also pass the generator to the function list() to
print the list.
In []: list(numbers)
Out[]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Moreover, like any other iterator, we can pass a generator to the function
next() to iterate through its elements.
In []: next(numbers)
Out[]: 0
In []: next(numbers)
Out[]: 1
In the above function, the while loop is true until start is less than or equal
to end and then the generator ceases to yield values. Calling the above
function will return a generator object.
150 | Chapter 8
In []: c = counter(1, 5)
In []: type(c)
Out[]: generator
And again, as seen above, we can call the list() function or run a loop
over generator object to traverse through its elements. Here, we pass the
object c to the list() function.
In []: list(c)
Out[]: [1, 2, 3, 4, 5]
This brings us to an end of this section. Iterators are a powerful and use-
ful tool in Python and generators are a good approach to work with lots of
data. If we don’t want to load all the data in the memory, we can use a gen-
erator which will pass us each piece of data at a time. Using the generator
implementation saves memory.
152 | Chapter 8
Chapter 9
Functions in Python
Let’s now explore this remarkably handy feature seen in almost all
programming languages: functions. There are lots of fantastic in-built
functions in Python and its ecosystem. However, often, we as a Python
programmer need to write custom functions to solve problems that are
unique to our needs. Here is the definition of a function.
A function is a block of code(that performs a specific task) which runs only when it
is called.
From the definition, it can be inferred that writing such block of codes, i.e.
functions, provides benefits such as
153
calling a function. For example, if we want to compute the length of a list,
we call a built-in len function. Using any function means we are calling it
to perform the task for which it is designed.
We need to provide an input to the len function while calling it. The
input we provide to the function is called an argument. It can be a data
structure, string, value or a variable referring to them. Depending upon
the functionality, a function can take single or multiple arguments.
154 | Chapter 9
• abs(value) returns the absolute value of a value provided as an argu-
ment.
• format(value[, format_spec]) converts a value to a ’formatted’ rep-
resentation, as controlled by format_spec.
• str([object]) returns a string version of object. If the object is not
provided, returns the empty string.
• bool([value]) return a Boolean value, i.e. one of True or False. value
is converted using the standard truth testing procedure1 . If the value
is false or omitted, this returns False; otherwise, it returns True.
• dir([object]) returns the list of names in the current local scope
when an argument is not provided. With an argument, it attempts
to return a list of valid attributes for that object.
• len(object) returns the length (the number of items) of an object. The
argument may be a sequence (such as a string, bytes, tuple, list, or
range) or a collection (such as a dictionary, set, or frozen set).
It is worth noting that almost all built-in functions take one or more argu-
ments, perform the specific operation on it and return the output. We will
keep learning about many more built-in functions as we progress through
our Python learning journey. More information about various built-in func-
tions can be obtained from Python official documentation2 .
Functions are defined using the def keyword, followed by an identifier name
along with the parentheses, and by the final colon that ends the line. The
1 https://docs.python.org/3/library/stdtypes.html#truth
2 https://docs.python.org/3/library/functions.html
def greet():
"""Block of statement.
or Body of function.
"""
print(' Hello from inside the function!')
The above defined greet function can be called using its name as shown
here.
The modified version of the above simple function explains these two
terms:
156 | Chapter 9
# Here 'person_name' is a parameter.
def greet(person_name):
"""Prints greetings along with the value received
via the parameter."""
print('Hello ' + person_name + '!')
The above call to the function greet takes a string Amigo as an argument
and the output will be as follows:
Hello Amigo!
This user defined function add takes two parameters a and b, sums them
together and assigns its output to a variable result and ultimately returns
the variable to calling statement as shown below:
We call the function add with two arguments x and y (as the function def-
inition has two parameters) initialized with 5 and 6 respectively, and the
addition returned by the function gets printed via the print statement as
shown below:
Similarly, functions can also return multiple values based on the implemen-
tation. The following function demonstrates the same.
# Function definition
def upper_lower(x):
"""
Returns the upper and lower version of the string.
The above upper_lower function takes one argument x (a string) and con-
verts it to their upper and lower versions. Let us call it and see the output.
158 | Chapter 9
# Printing output
print(upper)
PYTHON
print(lower)
python
Here, the call to upper_lower function has been assigned to two variables
upper and lower as the function returns two values which will be unpacked
to each variable respectively and the same can be verified in the output
shown above.
Notice that the above function computes the first argument to the power
of the second argument. The default value of the latter is 2. So now when
we call the function power only with a single argument, it will be assigned
to the number parameter and the return value will be obtained by squaring
number.
# Output
4
# Output
32
def sum_all(*args):
"""Sum all values in the *args."""
# Initialize result to 0
result = 0
160 | Chapter 9
# Sum all values
for i in args:
result += i
# Output
15
# Output
41
Here, *args is used as the parameter name (the shorthand for arguments),
but we can use any valid identifier as the parameter name. It justs needs to
be preceded by * to make it flexible in length. On the same lines, Python
provides another flavor of flexible arguments which are preceded by dou-
ble asterisk marks. When used ,they are unpacked to dictionaries (with the
same name) by the interpreter and are available to use within the function.
For example:
def info(**kwargs):
"""Print out key-value pairs in **kwargs."""
Here, the parameter **kwargs are known as keywords arguments which will
be converted into a dictionary of the same name. We then loop over it and
print all keys and values. Again, it is totally valid to use an identifier other
than kwargs as the parameter name. The info function can be called as
follows:
# Output
ticker: AAPL
price: 146.83
name: Apple Inc.
country: US
That is all about the default and flexible arguments. We now attempt to
head towards the documentation part of functions.
9.2.5 DocStrings
Python has a nifty feature called documentation string, usually referred to by
its shorter name docstrings. This is an important but not required tool that
should be used every time we write a program since it helps to document
the program better and makes it easier to understand.
Docstrings are written within triple single/double quotes just after defini-
tion header. They are written on the first logical line of a function. Doc-
strings are not limited to functions only; they also apply to modules and
classes. The convention followed for a docstring is a multi-line string where
the first line starts with a capital letter and ends with a dot. The second line
is blank followed by any detailed explanation starting from the third line.
It is strongly advised to follow this convention for all docstrings. Let’s see
this in practice with the help of an example:
162 | Chapter 9
def power(x, y):
"""
Equivalent to x**y or built-in pow() with two
arguments.
Parameters:
x (int or float): Base value for the power operation.
y (int or float): Power to which base value should be
raised.
Returns:
int or float: It returns x raised to the power of y.
"""
try:
return x ** y
except Exception as e:
print(e)
The function power defined above returns the raised value of the argument
x powered to y. The thing of our interest is the docstring written within '''
which documents the function. We can access a docstring of any function
using the __doc__ attribute (notice the double underscores) of that function.
The docstring for the power function can be accessed with the following
code:
print(power.__doc__)
Parameters:
Returns:
int or float: It returns x raised to the power of y.
We define the function outer which nests another function inner within
it. The outer function is referred to as an enclosing function and inner is
known as nested function. They are also referred to as inner functions some-
times. Upon calling the outer function, Python will, in turn, call the inner
function nested inside it and execute it. The output for the same is shown
below:
164 | Chapter 9
# Calling the 'outer' function
outer()
# Output
Got printed from the outer function.
Got printed from the nested function.
The output we got here is intuitive. First, the print statement within the
outer function got executed, followed by the print statement in the inner
function. Additionally, nested functions can access variables of the enclos-
ing functions. i.e. variables defined in the outer function can be accessed
by the inner function. However, the inner or the nested function cannot
modify the variables defined in the outer or enclosing function.
def outer(n):
number = n
def inner():
print('Number =', number)
inner()
outer(5)
# Output
Number = 5
Though the variable number is not defined within inner function, it is able
to access and print the number. This is possible because of scope mechanism
that Python provided. We discuss more on this in the following section.
Now consider, what if we want the nested function to modify the variable
that is declared in the enclosing function. The default behavior of Python
does not allow this. If we try to modify it, we will be presented with an
error. To handle such a situation, the keyword nonlocal comes to the
rescue.
def outer(n):
number = n
def inner():
nonlocal number
number = number ** 2
print('Square of number =', number)
A call to the outer function will now print the number passed as an argu-
ment to it, the square of it and the newly updated number (which is nothing
but the squared number only).
outer(3)
# Output
Number = 3
Square of number = 9
Number = 9
166 | Chapter 9
9.3.1 Names in the Python world
A name (also known as an identifier) is simply a name given to an object.
From Python basics, we know that everything in Python are objects. And
a name is a way to access the underlying object. Let us create a new vari-
able with a name price having a value 144, and check the memory location
identifier accessible by the function id.
# Case 1: Output
1948155424
# Case 2: Output
1948155424
Interestingly we see that the memory location of both cases (the variable
and its assigned value) is the same. In other words, both refer to the same
integer object. If you would execute the above code on your workstation,
memory location would almost certainly be different, but it would be the
same for both the variable and value. Let’s add more fun to it. Consider the
following code:
# Print price
print(price)
# Output
Memory location of price: 1948155456
Memory location of 145: 1948155456
# Output
Memory location of old_price: 1948155424
Memory location of 144: 1948155424
We increased the value of a variable price by 1 unit and see that the mem-
ory location of it got changed. As you may have guessed, the memory loca-
tion of an integer object 145 would also be the same as that of price. How-
ever, if we check the memory location of a variable old_price, it would
point to the memory location of integer object 144. This is efficient as
Python does not need to create duplicate objects. This also makes Python
powerful in a sense that a name could refer to any object, even functions.
Note that functions are also objects in Python. Now that we are aware of
the nitty-gritty of names in Python, we are ready to examine namespaces
closely.
9.3.2 Namespace
Name conflicts happen all the time in real life. For example, we often see
that there are multiple students with the same name X in a classroom. If
someone has to call the student X, there would be a conflicting situation for
determining which student X is actually being called. While calling, one
might use the last name along with the student’s first name to ensure that
the call is made to the correct student X.
168 | Chapter 9
Similarly, such conflicts also arise in programming. It is easy and manage-
able to have unique names when programs are small without any external
dependencies. Things start becoming complex when programs become
larger and external modules are incorporated. It becomes difficult and
wearisome to have unique names for all objects in the program when it
spans hundreds of lines.
9.3.3 Scopes
Until now we’ve been using objects anywhere in a program. However, an
important thing to note is not all objects are always accessible everywhere
in a program. This is where the concept of scope comes into the picture.
A scope is a region of a Python program where a namespace is directly
accessible. That is when a reference to a name (lists, tuples, variables, etc.)
is made, Python attempts to find the name in the namespace. The different
types of scopes are:
Local scope: Names that are defined within a local scope means they are de-
fined inside a function. They are accessible only within a function. Names
# Defining a function
def print_number():
# This is local scope
n = 10
# Printing number
print('Within function: Number is', n)
print_number()
# Output
Within function: Number is 10
Enclosing scope: Names in the enclosing scope refer to the names defined
within enclosing functions. When there is a reference to a name that is not
available within the local scope, it will be searched within the enclosing
scope. This is known as scope resolution. The following example helps us
understand this better:
number = 10
170 | Chapter 9
print('Number is', number)
inner()
outer()
# Output
Number is 10
We try to print the variable number from within the inner function where
it is not defined. Hence, Python tries to find the variable in the outer
function which works as an enclosing function. What if the variable is not
found within the enclosing scope as well? Python will try to find it in the
global scope which we discuss next.
Global scope: Names in the global scope means they are defined within the
main script of a program. They are accessible almost everywhere within
the program. Consider the following example where we define a variable n
before a function definition (that is, within global scope) and define another
variable with the same name n within the function.
# Global variable
n = 3
def relu(val):
# Local variable
n = max(0, val)
return n
# Output
First statement: 0
Second statement: 3
Here, the first print statement calls the relu function with a value of -3
which evaluates the maximum number to 0 and assigns the maximum
# Global variable
number = 5
# Output
Traceback (most recent call last):
172 | Chapter 9
word. The global keywords allow us to access the global name within the
local scope. Let us run the above code, but with the global keyword.
# Global variable
number = 5
# Output
Within function: Number is 7
Outside function: Number is 7
Firstly, the syntax shows that there is no function name. Secondly, argu-
ments refers to parameters, and finally, expression depicts the function body.
Let us create a function square which squares the argument provided to it
and returns the result. We create this function using the def keyword.
# Function defnition
def square(arg):
"""
Computes the square of an argument and returns the
result.
# Output
9
The function square defined above can be re-written in a single line using
the lambda keyword as shown below:
174 | Chapter 9
# Calling the lambda function using the name 'square'
print(square(3))
# Outpuut
9
In the above lambda function, it takes one argument denoted by arg and
returns its square. Lambda functions can have as many number of argu-
ments as we want after the lambda keyword during its definition. We will
restrict our discussion up to two arguments to understand how multiple
arguments work. We create another lambda function to raise the first argu-
ment to the power of the second argument.
# Output
8
Lambda functions are extensively used along with built-in map and filter
functions.
The lambda function in the above example will square each element of the
list nums and the map function will map each output to the corresponding
elements in the original list. We then store the result into a variable called
squares. If we print the square variable, Python will reveal us that it is a
map object.
# Printing squares
print(squares)
# Output
<map object at 0x00000000074EAD68>
To see what this object contains, we need to cast it to list using the list
function as shown below:
# Output
[1, 4, 9, 16, 25]
176 | Chapter 9
# Output
[True, True, True]
In the above example, we first create a list of random boolean values. Next,
we pass it to the filter function along with the None which specifies to
return the items that are true. Lastly, we cast the output of the filter func-
tion to a list as it outputs a filter object. In a more advanced scenario, we
can embed a lambda function in the filter function. Consider that we
have been given a scenario where we need to filter all strings whose length
is greater than 3 from a given set of strings. We can use filter and lambda
functions together to achieve this. This is illustrated below:
# Output
['three', 'four', 'five']
In the above example, a lambda function is used within the filter function
which checks for the length of each string in the strings list. And the
filter function will then filter out the strings which match the criteria
defined by the lambda function.
Apart from the map and filter functions discussed above, now we will
learn another handy function zip which can be used for iterating through
multiple sequences simultaneously.
In the Python world, the zip function works more or less as a container for
iterables instead of real files. The syntax for the zip is shown below:
zip(*iterables)
It takes an iterable as an input and returns the iterator that aggregates ele-
ments from each of the iterable. The output contains the iterator of a tuple.
The i-th element in the iterator is the tuple consisting the i-th element from
each input. If the iterables in the input are of unequal sizes, the output it-
erator stops when the shortest input iterable is exhausted. With no input,
it returns an empty iterator. Let us understand the working of zip with the
help of an example.
We define two lists tickers and companies which are used as an input to
the zip. The zipped object is the iterator of type zip and hence we can
iterate either over it using a looping technique to print its content:
# Output
Ticker name of AAPL is Apple Inc.
Ticker name of MSFT is Microsoft Corporation.
Ticker name of GOOG is Alphabet Inc.
178 | Chapter 9
# Casting the zip object to a list and printing it
print(list(zipped))
# Output
[('AAPL', 'Apple Inc.'),
('MSFT', 'Microsoft Corporation'),
('GOOG', 'Alphabet Inc.')]
print(new_companies)
('Apple Inc.', 'Microsoft Corporation', 'Alphabet Inc.')
180 | Chapter 9
Chapter 10
NumPy Module
NumPy is not a part of the Python Standard Library and hence, as with
any other such library or module, it needs to be installed on a workstation
before it can be used. Based on the Python distribution one uses, it can
be installed via a command prompt, conda prompt, or terminal using the
following command. One point to note is that if we use the Anaconda distri-
bution to install Python, most of the libraries (like NumPy, pandas, scikit-learn,
matplotlib, etc. ) used in the scientific Python ecosystem come pre-installed.
181
NumPy library, the command to install it would be preceded by
the character !.
Once installed we can use it by importing into our program by using the
import statement. The de facto way of importing is shown below:
import numpy as np
Here, the NumPy library is imported with an alias of np so that any func-
tionality within it can be used with convenience. We will be using this form
of alias for all examples in this section.
182 | Chapter 10
put_vol / call_vol
The NumPy array is pretty similar to the list, but has one useful feature: we
can perform operations over entire arrays(all elements in arrays). It’s easy
as well as super fast. Let us start by creating a NumPy array. To do this,
we use array() function from the NumPy package and create the NumPy
version of put_vol and call_vol lists.
# Importing NumPy library
In []: import numpy as np
# Creating arrays
In []: n_put_vol = np.array(put_vol)
In []: n_put_vol
Out[]: array([52.89, 45.14, 63.84, 77.1 , 74.6 ])
In []: n_call_vol
Out[]: array([49.51, 50.45, 59.11, 80.49, 65.11])
Here, we have two arrays n_put_vol and n_call_vol which holds put and
call volume respectively. Now, we can calculate PCR in one line:
# Computing Put Call Ratio (PCR)
In []: pcr = n_put_vol / n_call_vol
In []: pcr
Out[]: array([1.06826904, 0.89474727, 1.0800203,
0.95788297, 1.14575334])
First, when we tried to compute PCR with regular lists, we got an error, be-
cause Python cannot do calculations with lists like we want it to. Then
we converted these regular lists to NumPy arrays and the same opera-
tion worked without any problem. NumPy work with arrays as if they
are scalars. But we need to pay attention here. NumPy can do this easily
because it assumes that array can only contain values of a single type. It’s
either an array of integers, floats or booleans and so on. If we try to cre-
ate an array of different types like the one mentioned below, the resulting
NumPy array will contain a single type only. String in the below case:
In the example given above, an integer and a boolean were both converted
to strings. NumPy array is a new type of data structure type like the Python
list type that we have seen before. This also means that it comes with its
own methods, which will behave differently from other types. Let us im-
plement the + operation on the Python list and NumPy arrays and see how
they differ.
# Creating lists
In []: list_1 = [1, 2, 3]
184 | Chapter 10
# Creating arrays
In []: arr_1 = np.array([1, 2, 3])
Based on the output we got, it can be inferred that they are of data type
ndarray which stands for n-dimensional array within NumPy. These arrays
are one-dimensional arrays, but NumPy also allows us to create two dimen-
sional, three dimensional and so on. We will stick to two dimensional for
our learning purpose in this module. We can create a 2D (two dimensional)
NumPy array from a regular Python list of lists. Let us create one array for
all put and call volumes.
In []: call_vol
Out[]: [49.51, 50.45, 59.11, 80.49, 65.11]
In []: n_2d
Out[]:
array([[52.89, 45.14, 63.84, 77.1 , 74.6 ],
[49.51, 50.45, 59.11, 80.49, 65.11]])
We see that n_2d array is a rectangular data structure. Each list pro-
vided in the np.array creation function corresponds to a row in the two-
dimensional NumPy array. Also for 2D arrays, the NumPy rule applies: an
array can only contain a single type. If we change one float value in the
above array definition, all the array elements will be coerced to strings, to
end up with a homogeneous array. We can think of a 2D array as an ad-
vanced version of lists of a list. We can perform element-wise operation
with 2D as we had seen for a single dimensional array.
186 | Chapter 10
[0., 0., 0., 0., 0.]])
188 | Chapter 10
• rand([d0, d1, ..., dn]) is used to create an array of a given shape
and populate it with random samples from a uniform distribution over
[0, 1). It takes only positive arguments. If no argument is provided,
a single float value is returned.
190 | Chapter 10
In []: np.random.random(3)
Out[]: array([0.69929315, 0.61152299, 0.91313813])
In []: samples
Out[]:
array([1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0,
0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0])
192 | Chapter 10
NumPy arrays also have a rich set of attributes and methods which sim-
plifies the data analysis process to a great extent. Following are the most
useful array attributes. For illustration purpose, we will be using previ-
ously defined arrays.
In []: n_2d.ndim
Out[]: 2
• shape returns a tuple with the dimensions of the array. It may also
be used to reshape the array in-place by assigning a tuple of array
dimensions to it.
In []: n_call_vol.size
Out[]: 5
In []: n_2d.size
Out[]: 10
In []: n_put_vol.dtype
Out[]: dtype('float64')
A typical first step in analyzing a data is getting to the data in the first
place. In an ideal data analysis process, we generally have thousands of
numbers which need to be analyzed. Simply staring at these numbers
won’t provide us with any insights. Instead, what we can do is generate
summary statistics of the data. Among many useful features, NumPy
also provides various statistical functions which are good to perform such
statistics on arrays.
Let us create a samples array and populate it with samples drawn from
a normal distribution with a mean of 5 and standard deviation of 1.5 and
compute various statistics on it.
194 | Chapter 10
# drawn from a normal distribution
In []: samples_2d = np.random.normal(5, 1.5, size=(5, 5))
In []: samples_2d
Out[]:
array([[5.30338102, 6.29371936, 2.74075451, 3.45505812,
7.24391809],
[5.20554917, 5.33264245, 6.08886915, 5.06753721,
6.36235494],
[5.86023616, 5.54254211, 5.38921487, 6.77609903,
7.79595902],
[5.81532883, 0.76402556, 5.01475416, 5.20297957,
7.57517601],
[5.76591337, 1.79107751, 5.03874984, 5.05631362,
2.16099478]])
# Computing mean
In []: np.mean(samples)
Out[]: 5.009649198007546
In []: np.average(samples)
Out[]: 5.009649198007546
In []: np.median(samples_2d)
Out[]: 5.332642448141249
In []: np.var(samples_2d)
Out[]: 2.93390175942658
196 | Chapter 10
• std(a, axis=None) returns the standard deviation of an array or
along the specified axis.
In []: np.std(samples)
Out[]: 1.5154965981337756
In []: np.std(samples_2d)
Out[]: 1.7128636137844075
The methods discussed above can also be directly called upon NumPy ob-
jects such as samples, n_put_vol, samples_2d, etc. instead of using the np.
format as shown below. The output will be the same in both cases.
# Using np. format to compute the sum
In []: np.sum(samples)
Out[]: 5009.649198007546
The term broadcasting describes how NumPy treats arrays with different
shapes during arithmetic operations (with certain constraints). The smaller
array is ’broadcast’ across the larger array so that they have compatible
shapes. It also provides a mean of vectorizing array operations.
In []: a * b
Out[]: array([3, 6, 9])
NumPy’s broadcasting rule relaxes this constraint when the array’s shapes
meet certain constraints. The simplest broadcasting example occurs when
an array and a scalar value are combined in operation as depicted below:
In []: b = 3
In []: a * b
Out[]: array([3, 6, 9])
198 | Chapter 10
stretching analogy is only conceptual. NumPy is smart enough to use
the original scalar value without actually making copies so that broad-
casting operations are as memory and computationally efficient as possible.
The code in the last example is more efficient because broadcasting moves
less memory around during the multiplication than that of its counter-
part defined above it. Along with efficient number processing capabili-
ties, NumPy also provides various methods for array manipulation thereby
proving versatility. We discuss some of them here.
In []: res
Out[]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In []: demo
Out[]: array([0, 1, 2, 3])
# Printing array
200 | Chapter 10
In []: a
Out[]: array([0.71056952, 0.58306487, 0.13270092,
0.38583513, 0.7912277 ])
# Rounding to 0 decimals
In []: a.round()
Out[]: array([1., 1., 0., 0., 1.])
# Rounding to 2 decimals
In []: a.round(2)
Out[]: array([0.71, 0.58, 0.13, 0.39, 0.79])
In []: np.sort(n_put_vol)
Out[]: array([45.14, 52.89, 63.84, 74.6 , 77.1 ])
In []: np.sort(samples_2d)
Out[]:
array([[2.74075451, 3.45505812, 5.30338102, 6.29371936,
7.24391809],
[5.06753721, 5.20554917, 5.33264245, 6.08886915,
6.36235494],
[5.38921487, 5.54254211, 5.86023616, 6.77609903,
7.79595902],
[0.76402556, 5.01475416, 5.20297957, 5.81532883,
7.57517601],
[1.79107751, 2.16099478, 5.03874984, 5.05631362,
5.76591337]])
202 | Chapter 10
[2, 5, 8],
[3, 6, 9]])
# Printing it
In []: a
Out[]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
Index 0 1 2 3 4
np.array 52 88 41 63 94
Index 0 1 2
0 a b c
1 d e f
2 g h i
The two arrays arr_1d and arr_2d which depicts the above-shown struc-
ture have been created below:
We use square brackets [] to subset each element from NumPy arrays. Let
us subset arrays created above using indexing.
204 | Chapter 10
# Slicing the element at position (0, 1)
In []: arr_2d[0, 1]
Out[]: 'b'
Notice the syntax in the last example where we slice the last column. The :
has been provided as an input which denotes all elements and then filtering
the last column. Using only : would return us all elements in the array.
In []: arr_2d[:]
Out[]:
array([['a', 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', 'i']], dtype='<U1')
We create an array arr with ten elements in the above example. Then we
try to subset it using an anonymous index array. The index array consisting
of the values 2, 5, 5 and 1 correspondingly create an array of length 4,
i.e. same as the length index array. Values in the index array work as an
index to subset (in the above-given operation) and it simply returns the
corresponding values from the arr.
Extending this concept, an array can be indexed with itself. Using logical
operators, NumPy arrays can be filtered as desired. Consider a scenario,
where we need to filter array values which are greater than a certain thresh-
old. This is shown below:
Here, we create an array with the name rand_arr with 20 random values.
We then try to subset it with values which are greater than 30 using the
logical operator >. When an array is being sliced using the logical operator,
206 | Chapter 10
NumPy generates an anonymous array of True and False values which is
then used to subset the array. To illustrate this, let us execute the code used
to subset the rand_arr, i.e. code written within the square brackets.
In []: filter_
Out[]:
array([False, False, True, False, True, False, True,
True, False, False, False, False, False, True,
True, False, False, False, True, True])
It returned a boolean array with only True and False values. Here, True
appears wherever the logical condition holds true. NumPy uses this
outputted array to subset the original array and returns only those values
where it is True.
Apart from this approach, NumPy provides a where method using which
we can achieve the same filtered output. We pass a logical condition within
where condition, and it will return an array with all values for which condi-
tions stands true. We filter out all values greater than 30 from the rand_arr
using the where condition in the following example:
np.where[condition[, x, y]]
In []: heights
Out[]:
array([153.69911134, 154.12173942, 150.35772942,
151.53160722, 153.27900307, 154.42448961,
153.25276742, 151.08520803, 154.13922276,
159.71336708, 151.45302507, 155.01280829,
156.9504274 , 154.40626961, 155.46637317,
156.36825413,151.5096344 , 156.75707004,
151.14597394, 153.03848597])
Usage 1: Without x and y parameters. Using the where method without the
optional parameter as illustrated in the following example would return
the index values of the original array where the condition is true.
The above codes returned index values of the heights array where values
are greater than 153. This scenario is very similar to the one we have seen
above with the random array rand_arr where we tried to filter values
above 30. Here, the output is merely the index values. If we want the
original values, we need to subset the heights array using the output that
we obtained.
208 | Chapter 10
The output in the Usage 2 provides either True or False for all the elements
in the heights array in contrast to the Usage 1 where it returned index
values of only those elements where the condition was true. The optional
parameters can also be array like elements instead of scalars or static value
such as True or False.
Usage 3: With x and y being arrays. Now that we have quite a good un-
derstanding of how the where method works, it is fairly easy to guess the
output. The output will contain values from either x array or y array based
on the condition in the first argument. For example:
In []: x_array
Out[]:
array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30])
In []: y_array
Out[]:
array([111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
121, 122, 123, 124, 125, 126, 127, 128, 129, 130])
As expected, the output of the above code snippet contains values from
the array x_array when the value in the heights array is greater than 153,
otherwise, the value from the y_array will be outputted.
# Printing close_price
In []: close_price
Out[]:
array([137, 138, 133, 132, 134, 139, 132, 138, 137, 135,
136, 134, 134, 139, 135, 133, 136, 139, 132, 134])
We are to generate trading signals based on the buy condition given to us.
i.e. we go long or buy the stock when the closing price is greater than the
average price of 135.45. It can be easily computed using the where method
as shown below:
# Printing signals
In []: signals
Out[]: array([1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0,
0, 1, 1, 0, 0])
The signals array contains the trading signals where 1 represents the buy
and 0 represents no trading signal.
210 | Chapter 10
# Looping over a one-dimensional array
In []: for element in arr_1d:
...: print(element)
# Output
52
88
41
63
94
# Output
['a' 'b' 'c']
['d' 'e' 'f']
['g' 'h' 'i']
# Output
a
b
c
d
e
f
g
The output that we got can also be achieved using nditer() method of
NumPy, and it works for irrespective of dimensions.
# Output
a
b
c
d
e
f
g
h
i
This brings us to the end of a journey with the NumPy module. The exam-
ples provided above depicts only a minimal set of NumPy functionalities.
Though not comprehensive, it should give us a pretty good feel about what
is NumPy and why we should be using it.
212 | Chapter 10
8. Similar to lists, NumPy arrays can also be sliced using square brackets
[] and starts indexing with 0.
9. It is also possible to slice NumPy arrays based on logical conditions.
The resultant array would be an array of boolean True or False based
on which other arrays are sliced or filtered. This is known as boolean
indexing.
Pandas Module
Pandas is a Python library to deal with sequential and tabular data. It in-
cludes many tools to manage, analyze and manipulate data in a convenient
and efficient manner. We can think of its data structures as akin to database
tables or spreadsheets.
Pandas is built on top of the Numpy library and has two primary data struc-
tures viz. Series (1-dimensional) and DataFrame (2- dimensional). It can
handle both homogeneous and heterogeneous data, and some of its many
capabilities are:
215
11.1.1 Installing with pip
The simplest way to install Pandas is from PyPI.
In a terminal window, run the following command.
In your code, you can use the escape character ’!’ to install pandas directly
from your Python console.
pip help
import pandas as pd
pd.test()
216 | Chapter 11
• Indexes and labels.
• Searching of elements.
• Insertion, deletion and modification of elements.
• Apply set techniques, such as grouping, joining, selecting, etc.
• Data processing and cleaning.
• Work with time series.
• Make statistical calculations
• Draw graphics
• Connectors for multiple data file formats, such as, csv, xlsx, hdf5, etc.
Let’s see some examples of how to create and manipulate a Pandas Series:
import pandas as pd
s = pd.Series()
print(s)
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5, 6, 7])
print(s)
Out[]: 0 1
Out[]: 0 1
1 2
2 3
3 4
4 5
5 6
6 7
dtype: int64
Out[]: 0 0.383567
1 0.869761
2 1.100957
3 -0.259689
4 0.704537
dtype: float64
In all these examples, we have allowed the index label to appear by default
(without explicitly programming it). It starts at 0, and we can check the
index as:
218 | Chapter 11
In []: s.index
In []: s = pd.Series(np.random.randn(5),
index=['a', 'b', 'c', 'd', 'e'])
Out[]: a 1.392051
b 0.515690
c -0.432243
d -0.803225
e 0.832119
dtype: float64
import pandas as pd
dictionary = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
s = pd.Series(dictionary)
print(s)
Out[]: a 1
b 2
c 3
d 4
e 5
dtype: int64
In this case, the Pandas Series is created with the dictonary keys as index
unless we specify any other index.
array = [1, 2, 3, 4, 5]
s2 = pd.Series(array)
Out[]: a 1
b 2
c 3
d 4
e 5
dtype: int64
0 1
1 2
2 3
3 4
4 5
dtype: int64
• Selecting one item from the Pandas Series by means of its index:
In []: s1['a']
Out[]: 1
In []: s2[0]
Out[]: 1
• Selecting several items from the Pandas Series by means of its index:
220 | Chapter 11
e 5
dtype: int64
In []: s2[2:]
Out[]: 2 3
3 4
4 5
dtype: int64
In []: s2[:2]
Out[]: 2 3
3 4
4 5
dtype: int64
Out[]: a 1
b 99
c 3
d 4
e 5
dtype: int64
In []: s2[1] = 99
print(s2)
Out[]: 0 1
1 99
2 3
3 4
4 5
dtype: int64
Here are some powerful vectorized operations that let us perform quickly
calculations, for example:
• Add, subtract, multiply, divide, power, and almost any NumPy func-
tion that accepts NumPy arrays.
s1 + 2
s1 - 2
s1 * 2
s1 / 2
s1 ** 2
np.exp(s1)
222 | Chapter 11
• We can perform the same operations over two Pandas Series although
these must be aligned, that is, to have the same index, in other case,
perform a Union operation.
The index can be implicit, starting with zero or we can specify it ourselves,
even working with dates and times as indexes as well. Let’s see some ex-
amples of how to create and manipulate a Pandas DataFrame.
Out[]: A B C D E
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
224 | Chapter 11
In []: array = {'A' : [1, 2, 3, 4],
'B' : [4, 3, 2, 1]}
pd.DataFrame(array)
Out[]: A B
0 1 4
1 2 3
2 3 2
3 4 1
Out[]: A B
2018-12-01 1 4
2018-12-02 2 3
2018-12-03 3 2
2018-12-04 4 1
Out[]: a b c d e
0 1 2 3 4 5
Out[]: A B C
0 0.164358 1.689183 1.745963
1 -1.830385 0.035618 0.047832
2 1.304339 2.236809 0.920484
3 0.365616 1.877610 -0.287531
4 -0.741372 -1.443922 -1.566839
5 -0.119836 -1.249112 -0.134560
6 -0.848425 -0.569149 -1.222911
7 -1.172688 0.515443 1.492492
8 0.765836 0.307303 0.788815
9 0.761520 -0.409206 1.298350
Out[]: A B C
0 0.164358 1.689183 1.745963
1 -1.830385 0.035618 0.047832
2 1.304339 2.236809 0.920484
226 | Chapter 11
Out[]: A B C
7 -1.172688 0.515443 1.492492
8 0.765836 0.307303 0.788815
9 0.761520 -0.409206 1.298350
import pandas as pd
df=pd.read_csv('Filename.csv')
type(df)
Out[]: pandas.core.frame.DataFrame
This simple operation, loads the csv file into the Pandas DataFrame after
which we can explore it as we have seen before.
In this example, we want to load a csv file with blank space as separator:
import pandas as pd
df=pd.read_csv('Filename.csv', sep=' ')
In this example, we want to load columns from 0 and 5 and the first 100
rows:
228 | Chapter 11
import pandas as pd
df=pd.read_csv('Filename.csv', usecols=[0, 1, 2, 3, 4, 5],
nrows=100)
It’s possible to customize the headers, convert the columns or rows names
and carry out a good number of other operations.
import pandas as pd
df=pd.read_excel('Filename.xls', sheet_name='Sheet1')
This simple operation, loads the Sheet 1 from the Excel file into the Pandas
DataFrame.
Let’s begin with loading a csv file having details of a market instrument.
(1000, 5)
Here, we have read a csv file, of which we only need the columns of date,
opening, closing, high and low (the first 5 columns) and we check the form
of the DataFrame that has 1000 rows and 5 columns.
We can apply selection filters to the DataFrame itself, to select one column
to work with. For example, we could need the Close column:
In []: close=df['Close']
close.head()
Out[]: Close
0 18.96
1 19.34
2 19.44
3 20.33
4 20.52
230 | Chapter 11
3 20.33 50084000.0
4 20.52 61475200.0
Out[]:
Close Volume
100 320.08 4236029.0
101 320.87 6942493.0
102 326.17 4980316.0
103 325.84 8547764.0
104 337.34 4463807.0
105 337.02 5715817.0
106 345.10 4888221.0
In []: df.loc[100:110]
Out[]:
232 | Chapter 11
107 351.81 5032884.0
108 359.65 4898808.0
109 355.75 3280670.0
110 350.60 5353262.0
In []: df.iloc[100:110]
Out[]:
In the last example, we used the index as an integer position rather than by
label.
We can select a set of rows and columns like before:
234 | Chapter 11
Date Open ... Close ... Name
1 2017-03-26 307.34 ... 304.18 ... TSLA
2 2017-03-23 311.25 ... 301.54 ... TSLA
... ... ... ... ... ...
1080 2017-12-09 137.00 ... 141.60 ... TSLA
1081 2017-12-06 141.51 ... 137.36 ... TSLA
1082 2013-12-05 140.51 ... 140.48 ... TSLA
We’ll see how to sort it, re-index it, eliminate unwanted (or spurious) data,
add or remove columns and update values.
df2.T
Out[]:
In []: df.sort_index()
Out[]:
In []: df.sort_values(by='Close')
Out[]:
236 | Chapter 11
In []: df.sort_values(by=['Open', 'Close'])
Out[]:
Out[]:
0
a -0.134133
b -0.586051
c 1.179358
d 0.433142
e -0.365686
When the index is numeric we can use the same function to order by hand
the index:
0
4 1.058589
3 1.194400
2 -0.645806
1 0.836606
0 1.288102
Later in this section, we’ll see how to work and reorganize date and time
indices.
238 | Chapter 11
0
0 0.238304
1 2.068558
2 1.015650
3 0.506208
4 0.214760
To add a new column, we only need to include the new column name in the
DataFrame and assign a initialization value, or assign to the new column a
Pandas Series or another column from other DataFrame.
In []: df['new']=1
df
Out[]:
0 new
0 0.238304 1
1 2.068558 1
2 1.015650 1
3 0.506208 1
4 0.214760 1
Now, we can delete the column that we specify by index or by label if any:
1 2 3
0 -0.086348 -1.971855 1.168017
1 -0.061397 -0.542212 -1.412755
2 -0.587147 1.494690 1.756105
3 0.924202 0.517975 -0.914366
4 -0.431151 -0.401093 0.145646
In []: df['new']=1
df
Out[]:
0 new
0 0.238304 1
1 2.068558 1
2 1.015650 1
3 0.506208 1
240 | Chapter 11
0
0 0.238304
1 2.068558
2 1.015650
3 0.506208
A B C D
a 0.996496 -0.165002 0.727912 0.564858
b -0.388169 1.171039 -0.231934 -1.124595
c -1.385129 0.299195 0.573570 -1.736860
d 1.222447 -0.312667 0.957139 -0.054156
e 1.188335 0.679921 1.508681 -0.677776
Out[]: 0.996496
In []: df.iat[0, 0] = 0
Out[]:
A B C D
a 0.000000 -0.165002 0.727912 0.564858
b -0.388169 1.171039 -0.231934 -1.124595
c -1.385129 0.299195 0.573570 -1.736860
d 1.222447 -0.312667 0.957139 -0.054156
e 1.188335 0.679921 1.508681 -0.677776
242 | Chapter 11
11.7.9 Conditional updating of values
Another useful function is to update values that meet some criteria, for
example, update values whose values are greater than 0:
df[df > 0] = 1
df
Out[]:
A B C D
a 1.000000 -0.082466 1.000000 -0.728372
b -0.784404 -0.663096 -0.595112 1.000000
c -1.460702 -1.072931 -0.761314 1.000000
d 1.000000 1.000000 1.000000 -0.302310
e -0.488556 1.000000 -0.798716 -0.590920
We can also update the values of a specific column that meet some criteria,
or even work with several columns as criteria and update a specific column.
Out[]:
A B C D
a 1.0 -0.082466 1.000000 -0.728372
b 1.0 -0.663096 -0.595112 1.000000
c 1.0 -1.072931 -0.761314 1.000000
A B C D
a 1.0 -0.082466 1.000000 -0.728372
b 9.0 -0.663096 -0.595112 1.000000
c 9.0 -1.072931 -0.761314 1.000000
d 1.0 1.000000 1.000000 -0.302310
e 1.0 1.000000 -0.798716 -0.590920
Out[]:
A B C D
a 1.272361 1.799535 -0.593619 1.152889
b -0.318368 -0.190419 0.129420 1.551332
244 | Chapter 11
A B C D
c 0.166951 1.669034 -1.653618 0.656313
d 0.219999 0.951074 0.442325 -0.170177
e 0.312319 -0.765930 -1.641234 -1.388924
Out[]:
A B C D
a 1.272361 1.799535 -0.593619 1.152889
b -0.318368 -0.190419 0.129420 1.551332
c 0.166951 1.669034 -1.653618 0.656313
d 0.219999 0.951074 0.442325 -0.170177
e NaN -0.765930 -1.641234 -1.388924
In []: df=df.dropna()
print(df)
Out[]:
A B C D
a 1.272361 1.799535 -0.593619 1.152889
b -0.318368 -0.190419 0.129420 1.551332
c 0.166951 1.669034 -1.653618 0.656313
d 0.219999 0.951074 0.442325 -0.170177
Here we are deleting the whole row that has, in any of its columns, a NaN
value, but we can also specify that it deletes the column that any of its
values is NaN:
df=df.dropna(axis=1)
print(df)
A B C D
a 1.272361 1.799535 -0.593619 1.152889
b -0.318368 -0.190419 0.129420 1.551332
c 0.166951 1.669034 -1.653618 0.656313
d 0.219999 0.951074 0.442325 -0.170177
e 0.312319 -0.765930 -1.641234 -1.388924
A B C D
a 1.272361 1.799535 -0.593619 1.152889
b -0.318368 -0.190419 0.129420 1.551332
c 0.166951 1.669034 -1.653618 0.656313
d 0.219999 0.951074 0.442325 -0.170177
246 | Chapter 11
A B C D
e NaN -0.765930 -1.641234 -1.388924
In []: df=df.fillna(999)
print(df)
Out[]:
A B C D
a 1.272361 1.799535 -0.593619 1.152889
b -0.318368 -0.190419 0.129420 1.551332
c 0.166951 1.669034 -1.653618 0.656313
d 0.219999 0.951074 0.442325 -0.170177
e 999 -0.765930 -1.641234 -1.388924
A B C D
a -0.633249 -2.699088 0.574052 0.652742
def square_number(number):
return number**2
A B C D
a 0.401005 7.285074 0.329536 0.426073
b 0.003636 0.022658 0.022238 0.491704
c 0.002758 0.220412 0.808524 0.370161
d 1.830372 0.010671 0.209652 3.599253
e 0.007793 0.174989 1.216586 0.339254
248 | Chapter 11
df=pd.DataFrame(np.random.randn(5, 4),
index=['a','b','c','d','e'],
columns=['A', 'B', 'C', 'D'])
print(df)
Out[]:
A B C D
a -0.633249 -2.699088 0.574052 0.652742
b 0.060295 -0.150527 0.149123 -0.701216
c -0.052515 0.469481 0.899180 -0.608409
d -1.352912 0.103302 0.457878 -1.897170
e 0.088279 0.418317 -1.102989 0.582455
In []: df['D'].shift(1)
Out[]:
A B C D
a -0.633249 -2.699088 0.574052 NaN
b 0.060295 -0.150527 0.149123 0.652742
c -0.052515 0.469481 0.899180 -0.701216
d -1.352912 0.103302 0.457878 -0.608409
e 0.088279 0.418317 -1.102989 -1.897170
A B C D
a -0.633249 -2.699088 0.574052 -0.701216
b 0.060295 -0.150527 0.149123 -0.608409
c -0.052515 0.469481 0.899180 -1.897170
d -1.352912 0.103302 0.457878 0.582455
e 0.088279 0.418317 -1.102989 NaN
A B C D
a -0.633249 -2.699088 0.574052 0.652742
b 0.060295 -0.150527 0.149123 -0.701216
c -0.052515 0.469481 0.899180 -0.608409
d -1.352912 0.103302 0.457878 -1.897170
e 0.088279 0.418317 -1.102989 0.582455
In []: df.info()
250 | Chapter 11
D 5 non-null float64
shift 4 non-null float64
dtypes: float64(5)
memory usage: 240.0+ bytes
A B C D
a -0.633249 -2.699088 0.574052 0.652742
b 0.060295 -0.150527 0.149123 -0.701216
c -0.052515 0.469481 0.899180 -0.608409
d -1.352912 0.103302 0.457878 -1.897170
e 0.088279 0.418317 -1.102989 0.582455
In []: df.describe()
Out[]:
A B C D
count 5.000000 5.000000 5.000000 5.000000
mean -0.378020 -0.371703 0.195449 -0.394319
std 0.618681 1.325046 0.773876 1.054633
min -1.352912 -2.699088 -1.102989 -1.897170
25% -0.633249 -0.150527 0.149123 -0.701216
50% -0.052515 0.103302 0.457878 -0.608409
In []: df['A'].value_counts()
Out[]: 0.088279 1
-0.052515 1
0.060295 1
-0.633249 1
-1.352912 1
Name: A, dtype: int64
Out[]: A -0.378020
B -0.371703
C 0.195449
D -0.394319
shift -0.638513
dtype: float64
Out[]: a -0.526386
252 | Chapter 11
b 0.002084
c 0.001304
d -0.659462
e -0.382222
dtype: float64
Out[]: A 0.618681
B 1.325046
C 0.773876
D 1.054633
shift 1.041857
dtype: float64
Out[]: a 1.563475
b 0.491499
c 0.688032
d 0.980517
e 1.073244
dtype: float64
A B C D
a -0.633249 -2.699088 0.574052 0.652742
b 0.060295 -0.150527 0.149123 -0.701216
c -0.052515 0.469481 0.899180 -0.608409
d -1.352912 0.103302 0.457878 -1.897170
e 0.088279 0.418317 -1.102989 0.582455
A B C D
b 0.060295 -0.150527 0.149123 -0.701216
e 0.088279 0.418317 -1.102989 0.582455
We can also combine logical statements, we will filter all rows whose col-
umn ’A’ and ’B’ have their values greater than zero.
A B C D
e 0.088279 0.418317 -1.102989 0.582455
254 | Chapter 11
11.10 Iterating Pandas DataFrame
We can go through the DataFrame row by row to do operations in each
iteration, let’s see some examples.
In []: for item in df.iterrows():
print(item)
Out[]:
('a', A -0.633249
B -2.699088
C 0.574052
D 0.652742
shift NaN
Name: a, dtype: float64)
('b', A 0.060295
B -0.150527
C 0.149123
D -0.701216
shift 0.652742
Name: b, dtype: float64)
('c', A -0.052515
B 0.469481
C 0.899180
D -0.608409
shift -0.701216
Name: c, dtype: float64)
('d', A -1.352912
B 0.103302
C 0.457878
D -1.897170
shift -0.608409
Name: d, dtype: float64)
('e', A 0.088279
B 0.418317
C -1.102989
D 0.582455
shift -1.897170
Name: e, dtype: float64)
A B C D
a 1.179924 -1.512124 0.767557 0.019265
b 0.019969 -1.351649 0.665298 -0.989025
c 0.351921 -0.792914 0.455174 0.170751
d -0.150499 0.151942 -0.628074 -0.347300
e -1.307590 0.185759 0.175967 -0.170334
A B C D
a 2.030462 -0.337738 -0.894440 -0.757323
b 0.475807 1.350088 -0.514070 -0.843963
c 0.948164 -0.155052 -0.618893 1.319999
d 1.433736 -0.455008 1.445698 -1.051454
e 0.565345 1.802485 -0.167189 -0.227519
256 | Chapter 11
In []: df3 = pd.merge(df1, df2)
print(df3)
A B C D
a 1.179924 -1.512124 0.767557 0.019265
b 0.019969 -1.351649 0.665298 -0.989025
c 0.351921 -0.792914 0.455174 0.170751
d -0.150499 0.151942 -0.628074 -0.347300
e -1.307590 0.185759 0.175967 -0.170334
a 2.030462 -0.337738 -0.894440 -0.757323
b 0.475807 1.350088 -0.514070 -0.843963
c 0.948164 -0.155052 -0.618893 1.319999
d 1.433736 -0.455008 1.445698 -1.051454
e 0.565345 1.802485 -0.167189 -0.227519
A B C D
a 1.179924 -1.512124 0.767557 0.019265
b 0.019969 -1.351649 0.665298 -0.989025
c 0.351921 -0.792914 0.455174 0.170751
d -0.150499 0.151942 -0.628074 -0.347300
e -1.307590 0.185759 0.175967 -0.170334
a 2.030462 -0.337738 -0.894440 -0.757323
b 0.475807 1.350088 -0.514070 -0.843963
c 0.948164 -0.155052 -0.618893 1.319999
d 1.433736 -0.455008 1.445698 -1.051454
e 0.565345 1.802485 -0.167189 -0.227519
A B C D
a 1.179924 -1.512124 0.767557 0.019265
b 0.019969 -1.351649 0.665298 -0.989025
c 0.351921 -0.792914 0.455174 0.170751
258 | Chapter 11
A B C D
d -0.150499 0.151942 -0.628074 -0.347300
e -1.307590 0.185759 0.175967 -0.170334
a 2.030462 -0.337738 -0.894440 -0.757323
b 0.475807 1.350088 -0.514070 -0.843963
c 0.948164 -0.155052 -0.618893 1.319999
d 1.433736 -0.455008 1.445698 -1.051454
e 0.565345 1.802485 -0.167189 -0.227519
# Concat by column
In []: df3 = pd.concat([df1, df2], axis=1)
print(df3)
Out[]:
A B ... D A ... D
a 1.179924 -1.512124 ... 0.019265 2.030462 ... -0.757323
b 0.019969 -1.351649 ... -0.989025 0.475807 ... -0.843963
c 0.351921 -0.792914 ... 0.170751 0.948164 ... 1.319999
d -0.150499 0.151942 ... -0.347300 1.433736 ... -1.051454
e -1.307590 0.185759 ... -0.170334 0.565345 ... -0.227519
Out[]: DatetimeIndex([
'2018-12-01 00:00:00', '2018-12-01 01:00:00',
'2018-12-01 02:00:00', '2018-12-01 03:00:00',
'2018-12-01 04:00:00', '2018-12-01 05:00:00',
'2018-12-01 06:00:00', '2018-12-01 07:00:00',
'2018-12-01 08:00:00', '2018-12-01 09:00:00',
'2018-12-01 10:00:00', '2018-12-01 11:00:00',
'2018-12-01 12:00:00', '2018-12-01 13:00:00',
'2018-12-01 14:00:00', '2018-12-01 15:00:00',
'2018-12-01 16:00:00', '2018-12-01 17:00:00',
'2018-12-01 18:00:00', '2018-12-01 19:00:00',
'2018-12-01 20:00:00', '2018-12-01 21:00:00',
'2018-12-01 22:00:00', '2018-12-01 23:00:00',
'2018-12-02 00:00:00', '2018-12-02 01:00:00',
'2018-12-02 02:00:00', '2018-12-02 03:00:00',
'2018-12-02 04:00:00', '2018-12-02 05:00:00'],
dtype='datetime64[ns]', freq='H')
We can do the same to get a daily frequency2 (or any other, as per our re-
quirement). We can use the freq parameter to adjust this.
Now, we have a DateTimeIndex in the rng object and we can use it to create
a Series or DataFrame:
2 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandasrange.html
260 | Chapter 11
In []: ts = pd.DataFrame(np.random.randn(len(rng), 4),
index=rng, columns=['A', 'B', 'C', 'D'])
print(ts)
Out[]:
A B C D
2018-12-01 0.048603 0.968522 0.408213 0.921774
2018-12-02 -2.301373 -2.310408 -0.559381 -0.652291
2018-12-03 -2.337844 0.329954 0.289221 0.259132
2018-12-04 1.357521 0.969808 1.341875 0.767797
2018-12-05 -1.212355 -0.077457 -0.529564 0.375572
2018-12-06 -0.673065 0.527754 0.006344 -0.533316
2018-12-07 0.226145 0.235027 0.945678 -1.766167
2018-12-08 1.735185 -0.604229 0.274809 0.841128
Sometimes, we read the data from internet sources or from csv files and we
need to convert the date column into the index to work properly with the
Series or DataFrame.
Here, we can see the index as numeric and a Date column, let’s convert this
column into the index to indexing our DataFrame, read from a csv file, in
time. For this, we are going to use the Pandas set_index method
In []: df = df.set_index('Date')
df.tail()
Out[]:
262 | Chapter 11
each frequency interval, for example, if we pass from an hourly frequency
to daily, we must specify what we want to do with the group of data that fall
inside each frequency, we can do a mean, a sum, we can get the maximum
or the minimum, etc.
A B C D
2018-12-01 00:00:00 0.048603 0.968522 0.408213 0.921774
2018-12-01 01:00:00 -2.301373 -2.310408 -0.559381 -0.652291
2018-12-01 02:00:00 -2.337844 0.329954 0.289221 0.259132
2018-12-01 03:00:00 1.357521 0.969808 1.341875 0.767797
2018-12-01 04:00:00 -1.212355 -0.077457 -0.529564 0.375572
2018-12-01 05:00:00 -0.673065 0.527754 0.006344 -0.533316
In []: ts = ts.resample("1D").mean()
print(ts)
Out[]:
A B C D
2018-12-01 0.449050 0.127412 -0.154179 -0.358324
2018-12-02 -0.539007 -0.855894 0.000010 0.454623
A B C D
2018-12-01 0.048603 0.968522 0.408213 0.921774
2018-12-02 -2.301373 -2.310408 -0.559381 -0.652291
2018-12-03 -2.337844 0.329954 0.289221 0.259132
2018-12-04 1.357521 0.969808 1.341875 0.767797
2018-12-05 -1.212355 -0.077457 -0.529564 0.375572
2018-12-06 -0.673065 0.527754 0.006344 -0.533316
In []: ts['2018-12-15':]
Out[]:
A B C D
2018-12-02 0.324689 -0.413723 0.019163 0.385233
2018-12-03 -2.198937 0.536600 -0.540934 -0.603858
2018-12-04 -1.195148 2.191311 -0.981604 -0.942440
2018-12-05 0.621298 -1.435266 -0.761886 -1.787730
2018-12-06 0.635679 0.683265 0.351140 -1.451903
In []: ts['2018-12-15':'2018-12-20']
Out[]:
A B C D
2018-12-15 0.605576 0.584369 -1.520749 -0.242630
264 | Chapter 11
A B C D
2018-12-16 -0.105561 -0.092124 0.385085 0.918222
2018-12-17 0.337416 -1.367549 0.738320 2.413522
2018-12-18 -0.011610 -0.339228 -0.218382 -0.070349
2018-12-19 0.027808 -0.422975 -0.622777 0.730926
2018-12-20 0.188822 -1.016637 0.470874 0.674052
Matplotlib is a popular Python library that can be used to create data visual-
izations quite easily. It is probably the single most used Python package for
2D-graphics along with limited support for 3D-graphics. It provides both,
a very quick way to visualize data from Python and publication-quality fig-
ures in many formats. Also, It was designed from the beginning to serve
two purposes:
Much like Python itself, Matplotlib gives the developer complete control
over the appearance of their plots. It tries to make easy things easy and
hard things possible. We can generate plots, histograms, power spectra,
bar charts, error charts, scatter plots, etc. with just a few lines of code. For
simple plotting, the pyplot module within matplotlib package provides a
MATLAB-like interface to the underlying object-oriented plotting library. It
implicitly and automatically creates figures and axes to achieve the desired
plot.
267
functionality. Also, if we are working in a Jupyter Notebook, the line
%matplotlib inline becomes important, as it makes sure that the plots
are embedded inside the notebook. This is demonstrated in the example
below:
%matplotlib inline
NOTE : Matplotlib does not fall under the Python Standard Li-
brary and hence, like any other third party library, it needs to
be installed before it can be used. It can be installed using the
command pip install matplotlib.
• Axes is where the plotting occurs. The axes are effectively the area
that we plot data on. Each Axes has an X-Axis and a Y-Axis.
fig = plt.figure()
<Figure size 432x288 with 0 Axes>
Upon running the above example, nothing happens really. It only creates a
figure of size 432 x 288 with 0 Axes. Also, Matplotlib will not show anything
until told to do so. Python will wait for a call to show method to display the
268 | Chapter 12
plot. This is because we might want to add some extra features to the plot
before displaying it, such as title and label customization. Hence, we need
to call plt.show() method to show the figure as shown below:
plt.show()
12.1.1 Axes
All plotting happens with respect to an Axes. An Axes is made up of Axis
objects and many other things. An Axes object must belong to a Figure.
Most commands that we will ever issue will be with respect to this Axes
object. Typically, we will set up a Figure, and then add Axes on to it. We
can use fig.add_axes but in most cases, we find that adding a subplot fits
our need perfectly. A subplot is an axes on a grid system.
# -Example 1-
# Creating figure
fig = plt.figure()
# Creating subplot
# Sub plot with 1 row and 1 column at the index 1
ax = fig.add_subplot(111)
plt.show()
The above code adds a single plot to the figure fig with the help of
add_subplot() method. The output we get is a blank plot with axes
ranging from 0 to 1 as shown in figure 1.
We can customize the plot using a few more built-in methods. Let us add
the title, X-axis label, Y-axis label, and set limit range on both axes. This is
illustrated in the below code snippet.
# -Example 2-
fig = plt.figure()
# Creating subplot/axes
ax = fig.add_subplot(111)
270 | Chapter 12
Figure 2: An empty plot with title, labels and custom axis limits
# Creating subplot/axes
ax = fig.add_subplot(111)
plt.show()
The above code snippet gives the same output as figure 2. Using the set
method when all required parameters are passed as arguments.
# -Example 3-
# Creating subplots, setting title and axes labels
# using `pyplot`
plt.subplots()
plt.title('Plot using pyplot')
plt.xlabel('X-Axis Label')
plt.ylabel('Y-Axis Label')
plt.show()
The code above is more intuitive and has fewer variables to construct a plot.
The output for the same is shown in figure 3. It uses implicit calls to axes
method for plotting. However, if we take a look at "The Zen of Python" (try
import this), it says:
While very simple plots, with short scripts, would benefit from the con-
ciseness of the pyplot implicit approach, when doing more complicated
plots, or working within larger scripts, we will want to explicitly pass
around the axes and/or figure object to operate upon. We will be using
both approaches here wherever it deems appropriate.
fig = plt.figure()
ax = fig.add_subplot(111)
272 | Chapter 12
Figure 3: An empty plot using pyplot
fig, ax = plt.subplots()
Both versions of code produce the same output. However, the latter version
is cleaner.
# -Example 4-
# Creating subplots with 2 rows and 2 columns
fig, axes = plt.subplots(nrows=2, ncols=2)
plt.show()
Upon running the above code, Matplotlib would generate a figure with four
subplots arranged with two rows and two columns as shown in figure 4.
The axes object that was returned here would be a 2D-NumPy array, and
each item in the array is one of the subplots. Therefore, when we want to
work with one of these axes, we can index it and use that item’s methods.
Let us add the title to each subplot using the axes methods.
# -Example 5-
# Create a figure with four subplots and shared axes
fig, axes = plt.subplots(nrows=2, ncols=2, sharex=True,
sharey=True)
axes[0, 0].set(title='Upper Left')
axes[0, 1].set(title='Upper Right')
axes[1, 0].set(title='Lower Left')
axes[1, 1].set(title='Lower Right')
plt.show()
The above code generates a figure with four subplots and shared X and
Y axes. Axes are shared among subplots in row wise and column-wise
manner. We then set a title to each subplot using the set method for each
subplot. Subplots are arranged in a clockwise fashion with each subplot
having a unique index. The output is shown in figure 5.
274 | Chapter 12
Figure 5: Subplots with the share axes
12.2 Plotting
We have discussed a lot about laying things out, but we haven’t really dis-
cussed anything about plotting data yet. Matplotlib has various plotting
functions. Many more than we will discuss and cover here. However, a
full list or gallery1 can be a bit overwhelming at first. Hence, we will con-
dense it down and attempt to start with simpler plotting and than move
towards more complex plotting. The plot method of pyplot is one of the
most widely used methods in Matplotlib to plot the data. The syntax to call
the plot method is shown below:
The coordinates of the points or line nodes are given by x and y. The op-
tional parameter fmt is a convenient way of defining basic formatting like
color, market, and style. The plot method is used to plot almost any kind
of data in Python. It tells Python what to plot and how to plot it, and also
allows customization of the plot being generated such as color, type, etc.
1 https://matplotlib.org/gallery/index.html
Plotting | 275
12.2.1 Line Plot
A line plot can be plotted using the plot method. It plots Y versus X as lines
and/or markers. Below we discuss a few scenarios for plotting line. To plot
a line, we provide coordinates to be plotted along X and Y axes separately
as shown in the below code snippet.
# -Example 6-
# Defining coordinates to be plotted on X and Y axes
# respectively
x = [1.3, 2.9, 3.1, 4.7, 5.6, 6.5, 7.4, 8.8, 9.2, 10]
y = [95, 42, 69, 11, 49, 32, 74, 62, 25, 32]
The above code plots values in the list x along the X-axis and values in the
list y along the Y-axis as shown in figure 6.
The call to plot takes minimal arguments possible, i.e. values for Y-axis
only. In such a case, Matplotlib will implicitly consider the index of el-
ements in list y as the input to the X-axis as demonstrated in the below
example:
# -Example 7-
# Defining 'y' coordinates
y = [95, 42, 69, 11, 49, 32, 74, 62, 25, 32]
276 | Chapter 12
Figure 6: Line plot with x and y as data
The plots created uses the default line style and color. The optional parame-
ter fmt in the plot method is a convenient way for defining basic formatting
like color, marker, and line-style. It is a shortcut string notation consisting
of color, marker, and line:
fmt = '[color][marker][line]'
Each of them is optional. If not provided, the value from the style cycle2 is
used. We use this notation in the below example to change the line color:
# -Example 8-
# Plot line with green color
plt.plot(y, 'g')
Plotting | 277
Figure 7: Line plot with y as the only data
Following the fmt string notation, we changed the color of a line to green
using the character g which refers to the line color. This generates the plot
with green line as shown in figure 8. Likewise, markers are added using the
same notation as shown below:
# -Example 9-
# Plot continuous green line with circle markers
plt.plot(y, 'go-')
Here, the fmt parameters: g refers to the green color, o refers to circle mark-
ers and - refers to a continuous line to be plotted as shown in figure 9. This
formatting technique allows us to format a line plot in virtually any way
278 | Chapter 12
Figure 8: Line plot with green line
Plotting | 279
Figure 10: Line chart with asterisk markers
# -Example 10-
# Plot continuous green line with asterisk markers
plt.plot(y, 'g*-')
The output of the above code is figure 10 where the line and markers share
the same color, i.e. green specified by the fmt string. If we are to plot line
and markers with different colors, we can use multiple plot methods to
achieve the same.
# -Example 11-
# Plot list 'y'
plt.plot(y, 'g')
280 | Chapter 12
Figure 11: Plot with blue line and red markers
The above code plots line along with red circle markers as seen in figure 11.
Here, we first plot the line with the default style and then attempt to plot
markers with attributes r referring to red color and o referring to circle. On
the same lines, we can plot multiple sets of data using the same technique.
The example given below plots two lists on the same plot.
Plotting | 281
Figure 12: Line plot with two lines
plt.plot(y2, 'b*--')
The output can be seen in figure 12 where both green and blue lines are
drawn on the same plot. We can achieve the same result as shown above
using the different technique as shown below:
Essentially, the plot method makes it very easy to plot sequential data
282 | Chapter 12
Figure 13: Line plot from a NumPy array
structure such as list, NumPy arrays, pandas series, etc. Similar to plot-
ting lists, we can plot NumPy arrays directly via the plot method. Let us
plot NumPy one dimensional array. As we are executing codes directly in
IPython console, calling the plt.show() is not required and hence, we will
not be calling the same in subsequent examples. However, remember, it is
absolutely necessary to call it while writing Python code in order to show a
plot.
# -Example 13-
# Importing NumPy library
import numpy as np
Plotting | 283
Figure_14: Line plot from 2-D NumPy array
# -Example 14-
# Creating a two dimensional array 'arr_2d' with 40 samples
# and shape of (20, 2)
arr_2d = np.random.normal(size=40).reshape(20, 2)
Let us now move our focus to plot pandas data structures. The pandas li-
brary uses the standard convention as the matplotlib for plotting directly
from its data structures. The pandas also provide a plot method which
is equivalent to the one provided by matplotlib. Hence, the plot method
can be called directly from pandas Series and DataFrame objects. The
plot method on Series and DataFrame is just a simple wrapper around
plt.plot(). The below example illustrates plotting pandas Series object:
284 | Chapter 12
Figure 15: Line plot from a pandas series
# -Example 15-
# Importing necessary libraries
import pandas as pd
import numpy as np
In the above example, we call the plot method directly on pandas Series
object ts which outputs the plot as shown in figure 15. Alternatively, we
could have called plt.plot(ts). Calling ts.plot() is equivalent to call-
ing plt.plot(ts) and both calls would result in almost the same output
as shown above. Additionally, the plot() method on pandas object sup-
Plotting | 285
Figure 16: Line plot from a pandas series in green color
ports almost every attribute that plt.plot() supports for formatting. For
example, calling the plot method on pandas objects with a color attribute
would result in a plot with color mentioned by its value. This is shown
below:
# -Example 16-
# Plotting pandas Series in green color
ts.plot(color='green')
286 | Chapter 12
import pandas as pd
# Fetch data
data = pd.read_csv('https://bit.ly/2WcsJE7', index_col=0,
parse_dates=True)
The dataframe data will contain stock data with dates being the index. The
excerpt of the downloaded data is shown below:
Now we can plot any column of a data dataframe by calling plot method
on it. In the example given below, we plot the recent 100 data points from
the Volume column of the dataframe:
# -Example 17-
# Plot volume column
data.Volume.iloc[:100].plot()
The output of the above code is shown in figure 17. With a dataframe, plot
method is a convenience to plot all of the columns with labels. In other
words, if we plot multiple columns, it would plot labels of each column
as well. In the below example, we plot AdjOpen and AdjClose columns
together and the output for the same is shown in figure 18.
# -Example 18-
data[['AdjOpen', 'AdjClose']][:50].plot()
The plot method generates a line plot by default when called on pandas
data structures. However, it can also produce a variety of other charts as
we will see later in this chapter. Having said that, lets head forward to plot
scatter plots.
Plotting | 287
Figure 17: Line plot of a volume column
288 | Chapter 12
12.2.2 Scatter Plot
Scatter plots are used to visualize the relationship between two different
data sets. Matplotlib provides the scatter method within pyplot sub-
module using which scatter plots can be generated.
The x and y parameters are data positions and it can be array-like sequential
data structures. There are some instances where we have data in the format
that lets us access particular variables with string. For example, Python
dictionary or pandas dataframe. Matplotlib allows us to provide such an
object with the data keyword argument to the scatter method to directly
plot from it. The following example illustrates this using a dictionary.
# -Example 19-
# Creating a dictionary with three key-value pairs
dictionary = {'a': np.linspace(1, 100, 50),
'c': np.random.randint(0, 50, 50),
'd': np.abs(np.random.randn(50)) * 100}
Plotting | 289
Figure 19: Scatter plot with different size and color
to color to be used and the argument s represents the size of a data point.
These arguments c and s are optional. The output we get is a scatter plot
with different size and color as shown in figure 19. A simple scatter plot
with the same color and size gets plotted when we omit these optional ar-
guments as shown in the following example:
# -Example 20-
# Creating a scatter plot without color and the same size
plt.scatter(dictionary['a'], dictionary['b'])
The output of the above code will be a scatter plot as shown in figure 20.
To better understand the working of scatter plots, let us resort to our old
friends: lists x and y. We defined them earlier whem we learned line plots
and scatter plots. To refresh our memory, we re-define the same lists below:
290 | Chapter 12
Figure 20: Scatter plot with the same size and color
x = [1.3, 2.9, 3.1, 4.7, 5.6, 6.5, 7.4, 8.8, 9.2, 10]
y = [95, 42, 69, 11, 49, 32, 74, 62, 25, 32]
Now that we have data points ready, we can plot a scatter plot out of them
as below:
# -Example 21-
# Creating a scatter plot
plt.scatter(x, y, c=color, s=size)
Plotting | 291
Figure 21: Scatter plot of lists x and y
The scatter plot would contain data points each with different color and
size (as they are randomly generated). The output is shown in figure figure
21.
In finance, scatter plots are widely used to determine the relations between
two data sets visually. With our working knowledge of scatter plots, let’s
plot AdjOpen and AdjClose prices of AAPL stock that we have in pandas
dataframe data. When it comes to plotting data directly from a pandas
dataframe, we can almost always resort to plot method on pandas to plot
all sorts of plots. That is, we can directly use the plot method on the
dataframe to plot scatter plots akin to line plots. However, we need to
specify that we are interested in plotting a scatter plot using the argument
kind='scatter' as shown below:
# -Example 22-
# Plotting a scatter plot of 'AdjOpen' and 'AdjClose' of
# AAPL stock
data.plot(x='AdjOpen', y='AdjClose', kind='scatter')
plt.show()
292 | Chapter 12
Figure 22: Scatter plot of columns AdjOpen and AdjClose
for x and y coordinates along with the argument kind which gets resulted
in the output as shown in figure 22.
By visualizing price patterns using a scatter plot, it can be inferred that open
and close prices are positively correlated. Furthermore, we can generate the
same plot using the plt.scatter method.
# Method 1
plt.scatter(x='AdjOpen', y='AdjClose', data=data)
plt.show()
# Method 2
plt.scatter(x=data['AdjOpen'], y=data['AdjClose'])
plt.show()
The first method uses the argument data which specifies the data source,
whereas the second method directly uses dataframe slicing and hence, there
is no need to specify the data argument.
Plotting | 293
12.2.3 Histogram Plots
A histogram is a graphical representation of the distribution of data. It is a
kind of bar graph and a great tool to visualize the frequency distribution
of data that is easily understood by almost any audience. To construct a
histogram, the first step is to bin the range of data values, divide the entire
range into a series of intervals and finally count how many values fall
into each interval. Here, the bins are consecutive and non-overlapping.
In other words, histograms shows the data in the form of some groups.
All the bins/groups go on X-axis, and Y-axis shows the frequency of each
bin/group.
# -Example 23-
# Data values for creating a histogram
y = [95, 42, 69, 11, 49, 32, 74, 62, 25, 32]
# Creating a histogram
plt.hist(y)
plt.xlabel('Bins')
plt.ylabel('Frequency')
plt.show()
This is the simplest code possible to plot a histogram with minimal ar-
guments. We create a range of values and simply provide it to the hist
method and let it perform the rest of the things (creating bins, segregat-
ing each value to corresponding bin, plotting, etc.). It produces the plot as
shown in figure 23. The hist method also take bins as an optional argu-
ment. If this argument is specified, bins will be created as per the specified
value, otherwise, it will create bins on its own. To illustrate this, we explic-
itly specify the number of bins in the above code and generate the plot. The
modified code and output is shown below:
# -Example 24-
# Data values for creating a histogram
y = [95, 42, 69, 11, 49, 32, 74, 62, 25, 32]
294 | Chapter 12
Figure 23: A histogram
# Creating a histogram
plt.hist(y, bins= 20)
plt.xlabel('Bins')
plt.ylabel('Frequency')
plt.show()
# -Example 25-
# Creating an array
array = np.random.normal(0, 1, 10000)
# Creating a histogram
Plotting | 295
Figure 24: Histogram with 20 bins
plt.hist(array)
plt.xlabel('Bins')
plt.ylabel('Frequency')
plt.show()
The output we got in figure 25 shows that the data distribution indeed re-
sembles a normal distribution. Apart from bins argument, other arguments
that can be provided to hist are color and histtype. There are a number
of arguments that can be provided, but we will keep our discussion limited
to these few arguments only. The color of a histogram can be changed using
the color argument. The histtype argument takes some of the pre-defined
values such as bar, barstacked, step and stepfilled. The below example
illustrates the usage of these arguments and the output is shown in figure
26.
# -Example 26-
# Creating an array
array = np.random.normal(0, 1, 10000)
296 | Chapter 12
Figure 25: Histogram of an array
plt.xlabel('Bins')
plt.ylabel('Frequency')
plt.show()
# -Example 27-
# Creating an array
array = np.random.normal(0, 1, 10000)
Plotting | 297
Figure 26: Histogram with histtype='step'
298 | Chapter 12
Figure 28: Histogram of a volume column
around the hist function in matplotlib as was the case with scatter plots. To
plot a histogram, we need to specify the argument kind with the value hist
when a call to plot is made directly from the dataframe. We will be work-
ing with the same dataframe data that contains historical data for AAPL
stock.
In the first method, we directly make a call to plot method on the dataframe
data sliced with Volume column. Whereas in the second method, we use
the hist method provided by matplotlib.pyplot module to plot the his-
togram. Both methods plot the same result as shown in figure 28.
Plotting | 299
Figure 29: Line plot of close prices
12.3 Customization
Now that we have got a good understanding of plotting various types of
charts and their basic formatting techniques, we can delve deeper and look
at some more formatting techniques. We already learned that matplotlib
does not add any styling components on its own. It will plot a simple plain
chart by default. We, as users, need to specify whatever customization we
need. We start with a simple line plot and will keep on making it better.
The following example shows plotting of close prices of the AAPL ticker
that is available with us in the dataframe data.
# -Example 29-
# Extracting close prices from the dataframe
close_prices = data['AdjClose']
300 | Chapter 12
Figure 30: Line plot with rotated xticks
are something that we don’t want. They are all overlapped with each other.
This happens as the plot method did not find sufficient space for each date.
One way to overcome this issue is to rotate the values on the X-axis to make
it look better.
# -Example 30-
plt.plot(close_prices)
The xticks method along with the rotation argument is used to rotate the
values/tick names along the x-axis. The output of this approach is shown
in figure 30. Another approach that can be used to resolve the overlapping
issue is to increase the figure size of the plot such that the matplotlib can
easily show values without overlapping. This is shown in the below exam-
ple and the output is shown in figure 31:
# -Example 31-
# Creating a figure with the size 10 inches by 5 inches
Customization | 301
Figure 31: Line plot with custom figure size
Similarly, the matplotlib provides yticks method that can be used to cus-
tomize the values on the Y-axis. Apart from the rotation argument, there
are a bunch of other parameters that can be provided xticks and yticks to
customize them further. We change the font size, color and orientation of
ticks along the axes using the appropriate arguments within these methods
in the following example:
# -Example 32-
# Creating a figure, setting its size and plotting close
# prices on it
fig = plt.figure(figsize=(10, 5))
plt.plot(close_prices, color='purple')
302 | Chapter 12
Figure 32: Line plot with rotated ticks on axes and colored values
Along with the axes values, we change the color and font size of axes labels
as shown in figure 32. There are numbers of other customizations possible
using various arguments and matplotlib provides total flexibility to create
the charts as per one’s desire. Two main components that are missing in the
above plot are title and legend, which can be provided using the methods
title and legends respectively. Again, as with the other methods, it is
possible to customize them in a variety of way, but we will be restricting
our discussion to a few key arguments only. Adding these two methods as
shown below in the above code would produce the plot as shown in figure
33:
# -Example 33-
# Showing legends and setting the title of plot
plt.legend()
plt.title('AAPL Close Prices', color='purple', size=20)
# -Example 34-
# Adding the grid to the plot
plt.grid(True)
Customization | 303
Figure 33: Line plot with legends and the title
The axhline method allows us to add a horizontal line across the axis to
the plot. For example, we might consider adding the mean value of close
prices to show the average price of a stock for the whole duration. It can be
added using axhline method. Computation of mean value and its addition
to the original plot is shown below:
# -Example 35-
# Importing NumPy library
import numpy as np
Now that we have the mean value of close prices plotted in the figure 35,
one who looks at the chart for the first time might think what this red line
conveys? Hence, there is a need to explicitly mention it. To do so, we can
use the text method provided by matplotlib.pyplot module to plot text
anywhere on the figure.
# -Example 36-
304 | Chapter 12
Figure 34: Line plot with a grid
Customization | 305
Figure 36: Line plot with text on it
Using all these customization techniques, we have been able to evolve the
dull looking price series chart to a nice and attractive graphic which is not
only easy to understand but presentable too. However, we have restricted
ourselves to plotting only a single chart. Let us brace ourselves and learn
to apply these newly acquired customization techniques to multiple plots.
We already learned at the beginning of this chapter that a figure can have
multiple plots, and that can be achieved using the subplots method. The
following examples show stock prices of AAPL stock along with its traded
306 | Chapter 12
volume on each day. We start with a simple plot that plots stock prices and
volumes in the below example:
# -Example 37-
# Extracting volume from the dataframe 'data'
volume = data['AdjVolume']
First, we extract the AdjVolume column from the data dataframe into a
volume which happens to be pandas series object. Then, we create a figure
with sub-plots having two rows and a single column. This is achieved
using nrows and ncols arguments respectively. The sharex argument
specifies that both sub-plots will share the same x-axis. Likewise, we also
specify the figure size using the figsize argument. These two subplots are
unpacked into two axes: ax1 and ax2 respectively. Once, we have the axes,
desired charts can be plotted on them.
Next, we plot the close_prices using the plot method and specify its color
to be purple using the color argument. Similar to the plot method, mat-
plotlib provides bar method to draw bar plots which takes two arguments:
the first argument to be plotted on the X-axis and second argument to be
plotted along the y-axis. For our example, values on X-axis happens to be
a date (specified by volume.index), and value for each bar on the Y-axis is
provided using the recently created volume series. After that, we plot grids
Customization | 307
Figure 37: Sub-plots with stock price and volume
on both plots. Finally, we display both plots. As can be seen above in fig-
ure 37, matplotlib rendered a decent chart. However, it misses some key
components such as title, legends, etc. These components are added in the
following example:
# -Example 38-
# Creating figure with multiple plots
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1,
sharex=True,
figsize=(10, 8))
ax1.plot(close_prices, color='purple', label='Prices')
ax1.grid(True)
308 | Chapter 12
Figure 38: Sub-plots with legends and titles
plt.show()
Here, we use the legend method to set legends in both plots as shown in
figure 38. Legends will print the values specified by the label argument
while plotting each plot. The set_title is used to set the title for each plot.
Earlier, while dealing with the single plot, we had used the title method
to set the title. However, it doesn’t work the same way with multiple plots.
Customization | 309
Figure 39: Sub-plots with tight layout
method which automatically adjusts the padding and other similar param-
eters between subplots so that they fits into the figure area.
# -Example 39-
# Setting layout
plt.tight_layout()
The above code explicitly specifies the layout and the label on the x-axis
which results into the chart as shown in figure 39.
310 | Chapter 12
rendered, the new style needs to be explicitly specified using the following
code:
plt.style.use('ggplot')
One the style is set to use, all plots rendered after that will use the same and
newly set style. To list all available styles, execute the following code:
plt.style.available
Let us set the style to one of the pre-defined styles known as ’fivethirtyeight’
and plot the chart.
# -Example 40-
plt.style.use('fivethirtyeight')
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1,
sharex=True,
figsize=(10, 8))
plt.tight_layout()
plt.xlabel('Dates')
plt.show()
The output of the above code is shown in the figure 40. By changing the
style, we get a fair idea about how styles play an important role to change
the look of charts cosmetically while plotting them.
Customization | 311
Figure 40: Plot with pre-defined style 'fivethirtyeight'
312 | Chapter 12
The last method that we will study is the savefig method that is used to
save the figure on a local machine. It takes the name of the figure by which
it will be saved. This is illustrated below:
plt.savefig('AAPL_chart.png')
Executing the above code will save the chart we plotted above with the
name AAPL_chart.png.
This brings us to the end of this chapter. We started with the basics of
figure and plots, gradually learning various types of charts and along with
the finer details.
We also learned customization and took a sneak peek into plotting multiple
plots within the same chart.
314 | Chapter 12
References
Chapter 1 - Introduction
315
Chapter 4 - Modules, Packages and Libraries
1. Python Official Documentation:
https://docs.python.org/3/tutorial/modules.html and
https://docs.python.org/3/library/index.html
316 | References
Chapter 10 - Numpy Module
1. 10 Minutes to pandas:
https://pandas.pydata.org/pandas-docs/stable/10min.html
2. Pandas IO Tools:
https://pandas.pydata.org/pandas-docs/stable/io.html
1. Matplotlib Plot:
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html
2. Pyplot Tutorial:
https://matplotlib.org/tutorials/introductory/pyplot.html
References | 317