Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
towardsdatascience.com/python-for-finance-the-complete-beginners-guide-764276d74cef
28 de julio de 2020
Photo by from
In this post, I will walk you through some great hands-on exercises that will help you to have some understanding of
how to use Python for finance. First, I’ll introduce you to our friend Pyhton and then we will get to the fun part which
is programming. As mentioned in the subtitle, we will be using Amazon Stock Data. If you are wondering is it free to
get that data, the answer is absolute yes. The stock data is available on NASDAQ official website. The NASDAQ
(National Association of Securities Dealers Automated Quotations) is an electronic stock exchange with more than
3,300 company listings.
The Amazon stock data can be downloaded from here. On this website, you can find stock data for different
companies and practice your skills using those datasets. I can’t wait, let’s get started!
Table of Contents:
Python
Understanding the Amazon Stock Data
Data Cleaning
Data Visualization
Python
Python is a general-purpose programming language that is becoming ever more popular for analyzing data. Python
also lets you work quickly and integrate systems more effectively. Companies from all around the world are utilizing
Python to gather bits of knowledge from their data. The official Python page if you want to learn more.
Understanding the Data
When you first load data into a dataframe, it is a good practice to take a look at it before you start manipulating it.
This helps to understand that you have the right data and some insights about it. As mentioned earlier, for this
exercise we will be using historical data of a company from NASDAQ. I thought Amazon would be a good one to go
with. After walking through with me on this exercise, you will learn some skills that will give you the ability to practice
yourself using different datasets.
The dataframe that we will be using contains the closing prices of Amazon stock of the last one month (June 24 — July
23, 2020).
Read Data
Head Method
The first thing we’ll do to get some understanding of the data is using the head method. When you call the head
method on the dataframe, it displays the first five rows of the dataframe. After running this method, we can also see
that our data is sorted by the date index.
amzn.head()
result
Tail Method
Another helpful method we will call is the tail method. It displays the last five rows of the dataframe. Let’s say if you
want to see the last three rows, you can input the value 3 as an integer between the parentheses.
amzn.tail()
result
Describe Method
The last method we’ll be calling before we get deep is the describe method. It returns us the statistical summary of our
data. In default, the describe method will return the summary statistics of all numerical columns, for example in our
example all columns are numerical. The summary will include the following items: rows count, mean, standard
deviation, minimum and maximum values, and lastly the percentiles.
amzn.describe()
describe method
Why are we just getting values for the Volume column and not other columns? Here comes what we call Data
Preparation. Cleaning data and making it ready for analysis is a major step. There are a couple of things we have to
care before we go to the next steps. Feel free to check the post below to learn more about data cleaning when working
with different column data types.
Data Cleaning
We mentioned earlier that describe method works specifically with numeric values, this means that the Volume
column was the only numeric value in our dataframe. Let’s check the data types of our columns.
amzn.dtypes
As you can see above, Volume column is the only integer type and the rest is an object type. So we have to take care of
the data types. But before converting them, we have clean the dollar sign, otherwise, our program will get confused
when trying to convert dollar sign to a numeric value.
Good, now we can convert the data types. We don’t need to change anything with Date and Volume columns. We will
convert the rest columns into a numeric value, for this exercise we can go with float numeric type.
# Renaming column names and converting the data typesdf = amzn df.columns = ['Date', 'Close', 'Volume', 'Open',
'High', 'Low'] # Converting data types
df = df.astype({"Close": float, "Volume": int, "Open": float, "High": float, "Low": float}) df.dtypes
data types after the change
Great, we solved the data type issue. Now, let’s try to run the describe method and see how it works.
df.describe()
result
Well done! Now as you can see above, the describe method worked perfectly with all our numeric columns. We can
also customize our results for the describe method by using different parameters. Describe has three parameters that
we will use in this example: include, percentiles, and exclude.
df.describe(include = "float")
result
df.describe(include = "object")
result
df.describe(exclude = "int")
result
Comparison Operators
<
>
<=
>=
==
!=
We will use these operators to compare a specific value to values in the column. The result will be a series of booleans:
True and Falses. True if the comparison is right, false if the comparison is not right.
# Closing price more than 3000mask_closeprice = df.Close > 3000 high_price = df.loc[mask_closeprice] high_price.head()
result
Pandas offers operators to combine different results of our boolean comparisons. These operators are: And, Or, Not.
We can use these operators to create more complex conditions. For example, let’s say we want to see the AMZN stock
data where the closing price is more than 3000 and the volume is more than 5 million. Here is how we do it:
# Closing price more than 3000 and traded volume more than 5mask_closeprice = df.Close > 3000
mask_volume = df.Volume > 5000000
high_price_volume.head()
result
Data Visualization
Visualizing the data is an important step in understanding the data. It helps us to see more than just rows of values, it
gives us a better picture of the data. It is also helpful when we want to make comparisons between different data
values.
Visualizing the data is also a great way to understand and see the relationships between different columns.
Matplotib
The most commonly used 2D plotting library is called Matplotlib. This library is very powerful, it has also a learning
curve. With the help of this learning curve, other libraries have been built around this library.
Let’s plot the stock prices for the last one month. Our x-axis will be the date and y-axis will be the closing prices on
each day. This will show us how the stock price changes during the one month period. Speaking from a business point,
this line plot is called a price fluctuation plot, which helps to detect seasonal patterns in the stock price.
df.plot(x='Date', y='Close')
line plot
Rotate
The plot method offers a lot of interesting parameters that you can try out. One of them is the rotation parameter,
using this parameter we can rotate the labels of the plot. Here is an example of rotating the date label 90 degrees so
that it’s easier to read.
Title
If you want to give your plot a title, this is the method to use. We will pass a string into our title parameter.
line plot
Line
Bar
Pie
Scatter
Histogram
Let’s do an example of scatter plot. We will add a new parameter into our method called kind. Yes, it was that easy.
scatter plot
Now let’s do an example of histogram. Histogram plot is a great way to see the distribution of values.
Thank you for reading this post, I hope you enjoyed and learn something new today. Feel free to contact me through
my blog if you have any questions while implementing the code. I will be more than happy to help. You can find more
posts I’ve published related to Python and Machine Learning. Stay safe and happy coding!
I am Behic Guven, and I love sharing stories on creativity, programming, motivation, and life.