Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Python for Finance — The Complete Beginner’s Guide

towardsdatascience.com/python-for-finance-the-complete-beginners-guide-764276d74cef

28 de julio de 2020

Photo by from

Welcome to The Complete Beginner’s Guide to Python for Finance.

In this post, I will walk you through some great hands-on exercises that will help you to have some understanding of
how to use Python for finance. First, I’ll introduce you to our friend Pyhton and then we will get to the fun part which
is programming. As mentioned in the subtitle, we will be using Amazon Stock Data. If you are wondering is it free to
get that data, the answer is absolute yes. The stock data is available on NASDAQ official website. The NASDAQ
(National Association of Securities Dealers Automated Quotations) is an electronic stock exchange with more than
3,300 company listings.

The Amazon stock data can be downloaded from here. On this website, you can find stock data for different
companies and practice your skills using those datasets. I can’t wait, let’s get started!

Table of Contents:

Python
Understanding the Amazon Stock Data
Data Cleaning
Data Visualization

Python
Python is a general-purpose programming language that is becoming ever more popular for analyzing data. Python
also lets you work quickly and integrate systems more effectively. Companies from all around the world are utilizing
Python to gather bits of knowledge from their data. The official Python page if you want to learn more.
Understanding the Data
When you first load data into a dataframe, it is a good practice to take a look at it before you start manipulating it.
This helps to understand that you have the right data and some insights about it. As mentioned earlier, for this
exercise we will be using historical data of a company from NASDAQ. I thought Amazon would be a good one to go
with. After walking through with me on this exercise, you will learn some skills that will give you the ability to practice
yourself using different datasets.

The dataframe that we will be using contains the closing prices of Amazon stock of the last one month (June 24 — July
23, 2020).

Read Data

import pandas as pd amzn = pd.read_csv('amzn_data.csv')

Head Method
The first thing we’ll do to get some understanding of the data is using the head method. When you call the head
method on the dataframe, it displays the first five rows of the dataframe. After running this method, we can also see
that our data is sorted by the date index.

amzn.head()

result

Tail Method
Another helpful method we will call is the tail method. It displays the last five rows of the dataframe. Let’s say if you
want to see the last three rows, you can input the value 3 as an integer between the parentheses.

amzn.tail()
result

Describe Method
The last method we’ll be calling before we get deep is the describe method. It returns us the statistical summary of our
data. In default, the describe method will return the summary statistics of all numerical columns, for example in our
example all columns are numerical. The summary will include the following items: rows count, mean, standard
deviation, minimum and maximum values, and lastly the percentiles.

amzn.describe()

describe method

Why are we just getting values for the Volume column and not other columns? Here comes what we call Data
Preparation. Cleaning data and making it ready for analysis is a major step. There are a couple of things we have to
care before we go to the next steps. Feel free to check the post below to learn more about data cleaning when working
with different column data types.

Cleaning Data in Python (Data Types)

A simple explanation of manipulating data types using pandas


medium.com

Data Cleaning
We mentioned earlier that describe method works specifically with numeric values, this means that the Volume
column was the only numeric value in our dataframe. Let’s check the data types of our columns.

amzn.dtypes

column data types

As you can see above, Volume column is the only integer type and the rest is an object type. So we have to take care of
the data types. But before converting them, we have clean the dollar sign, otherwise, our program will get confused
when trying to convert dollar sign to a numeric value.

amzn = amzn.replace({'\$':''}, regex = True) amzn.head()

after removing the dollar sign

Good, now we can convert the data types. We don’t need to change anything with Date and Volume columns. We will
convert the rest columns into a numeric value, for this exercise we can go with float numeric type.

# Renaming column names and converting the data typesdf = amzn df.columns = ['Date', 'Close', 'Volume', 'Open',
'High', 'Low'] # Converting data types
df = df.astype({"Close": float, "Volume": int, "Open": float, "High": float, "Low": float}) df.dtypes
data types after the change

Great, we solved the data type issue. Now, let’s try to run the describe method and see how it works.

df.describe()

result

Well done! Now as you can see above, the describe method worked perfectly with all our numeric columns. We can
also customize our results for the describe method by using different parameters. Describe has three parameters that
we will use in this example: include, percentiles, and exclude.

df.describe(include = "float")
result

df.describe(include = "object")

result

df.describe(exclude = "int")

result

df.describe(percentiles = [0.1, 0.5, 0.9])


result

Filtering the Data

Comparison Operators
<
>
<=
>=
==
!=

We will use these operators to compare a specific value to values in the column. The result will be a series of booleans:
True and Falses. True if the comparison is right, false if the comparison is not right.

Masking by the Closing Price


When we pass a boolean series to a dataframe using loc[] operator, a new dataframe will be returned containing only
the True values.

# Closing price more than 3000mask_closeprice = df.Close > 3000 high_price = df.loc[mask_closeprice] high_price.head()
result

Pandas offers operators to combine different results of our boolean comparisons. These operators are: And, Or, Not.
We can use these operators to create more complex conditions. For example, let’s say we want to see the AMZN stock
data where the closing price is more than 3000 and the volume is more than 5 million. Here is how we do it:

# Closing price more than 3000 and traded volume more than 5mask_closeprice = df.Close > 3000
mask_volume = df.Volume > 5000000

millionhigh_price_volume = df.loc[mask_closeprice & mask_volume]

high_price_volume.head()

result

Data Visualization
Visualizing the data is an important step in understanding the data. It helps us to see more than just rows of values, it
gives us a better picture of the data. It is also helpful when we want to make comparisons between different data
values.

Visualizing the data is also a great way to understand and see the relationships between different columns.

Matplotib
The most commonly used 2D plotting library is called Matplotlib. This library is very powerful, it has also a learning
curve. With the help of this learning curve, other libraries have been built around this library.
Let’s plot the stock prices for the last one month. Our x-axis will be the date and y-axis will be the closing prices on
each day. This will show us how the stock price changes during the one month period. Speaking from a business point,
this line plot is called a price fluctuation plot, which helps to detect seasonal patterns in the stock price.

df.plot(x='Date', y='Close')

line plot

Rotate
The plot method offers a lot of interesting parameters that you can try out. One of them is the rotation parameter,
using this parameter we can rotate the labels of the plot. Here is an example of rotating the date label 90 degrees so
that it’s easier to read.

df.plot(x='Date', y='Close', rot=90)


line plot

Title
If you want to give your plot a title, this is the method to use. We will pass a string into our title parameter.

df.plot(x='Date', y='Close', rot=90, title="AMZN Stock Price")

line plot

More Plot Types


The default type of plot method is line plot, but there are also many other plot types available to use depending on our
use case. Some of other plotting kinds can be listed as:

Line
Bar
Pie
Scatter
Histogram

Let’s do an example of scatter plot. We will add a new parameter into our method called kind. Yes, it was that easy.

df.plot(x='Date', y='Close', kind='scatter', rot=90, title="AMZN Stock Price")

scatter plot

Now let’s do an example of histogram. Histogram plot is a great way to see the distribution of values.

df.plot(x='Date', y='Volume', kind='hist', rot=90, title="AMZN Stock Price")


histogram plot

Thank you for reading this post, I hope you enjoyed and learn something new today. Feel free to contact me through
my blog if you have any questions while implementing the code. I will be more than happy to help. You can find more
posts I’ve published related to Python and Machine Learning. Stay safe and happy coding!

I am Behic Guven, and I love sharing stories on creativity, programming, motivation, and life.

Follow my blog and Towards Data Science to stay inspired.

You might also like