Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
176 views

Introduction To Data Visualization in Python - by Gilbert Tanner - Towards Data Science

The document is an introduction to data visualization in Python using Matplotlib, Pandas, and Seaborn. It discusses popular Python plotting libraries and how to create basic plots like scatter plots, line charts, histograms, and bar charts using Matplotlib. It provides code examples to generate each type of plot using the Iris and wine review datasets.

Uploaded by

nomiman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views

Introduction To Data Visualization in Python - by Gilbert Tanner - Towards Data Science

The document is an introduction to data visualization in Python using Matplotlib, Pandas, and Seaborn. It discusses popular Python plotting libraries and how to create basic plots like scatter plots, line charts, histograms, and bar charts using Matplotlib. It provides code examples to generate each type of plot using the Iris and wine review datasets.

Uploaded by

nomiman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Get started Open in app

Follow 548K Followers

Introduction to Data Visualization in Python


How to make graphs using Matplotlib, Pandas and Seaborn

Gilbert Tanner Jan 23, 2019 · 9 min read

Figure 1: Photo by Lukas Blazek on Unsplash

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 1/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Data visualization is the discipline of trying to understand data by placing it in a visual


Get started Open in app
context so that patterns, trends and correlations that might not otherwise be detected
can be exposed.

Python offers multiple great graphing libraries that come packed with lots of different
features. No matter if you want to create interactive, live or highly customized plots
python has an excellent library for you.

To get a little overview here are a few popular plotting libraries:

Matplotlib: low level, provides lots of freedom

Pandas Visualization: easy to use interface, built on Matplotlib

Seaborn: high-level interface, great default styles

ggplot: based on R’s ggplot2, uses Grammar of Graphics

Plotly: can create interactive plots

In this article, we will learn how to create basic plots using Matplotlib, Pandas
visualization and Seaborn as well as how to use some specific features of each library.
This article will focus on the syntax and not on interpreting the graphs, which I will
cover in another blog post.

In further articles, I will go over interactive plotting tools like Plotly, which is built on D3
and can also be used with JavaScript.

Importing Datasets
In this article, we will use two datasets which are freely available. The Iris and Wine
Reviews dataset, which we can both load in using pandas read_csv method.

1 import pandas as pd
2 iris = pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width
3 print(iris.head())

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 2/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

load_iris.py hosted with ❤ by GitHub view raw


Get started Open in app

Figure 2: Iris dataset head

1 wine_reviews = pd.read_csv('winemag-data-130k-v2.csv', index_col=0)


2 wine_reviews.head()

load_wine_reviews.py hosted with ❤ by GitHub view raw

Figure 3: Wine Review dataset head

Matplotlib

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 3/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Matplotlib is the most popular python plotting library. It is a low-level library with a
Get started Open in app
Matlab like interface which offers lots of freedom at the cost of having to write more
code.

To install Matplotlib pip and conda can be used.

pip install matplotlib


or
conda install matplotlib

Matplotlib is specifically good for creating basic graphs like line charts, bar charts,
histograms and many more. It can be imported by typing:

import matplotlib.pyplot as plt

Scatter Plot
To create a scatter plot in Matplotlib we can use the scatter method. We will also create
a figure and an axis using plt.subplots so we can give our plot a title and labels.

1 # create a figure and axis


2 fig, ax = plt.subplots()
3
4 # scatter the sepal_length against the sepal_width
5 ax.scatter(iris['sepal_length'], iris['sepal_width'])
6 # set a title and labels
7 ax.set_title('Iris Dataset')
8 ax.set_xlabel('sepal_length')
9 ax.set_ylabel('sepal_width')

matplotlib_simple_scatterplot.py hosted with ❤ by GitHub view raw

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 4/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Get started Open in app

Figure 4: Matplotlib Scatter plot

We can give the graph more meaning by coloring in each data-point by its class. This can
be done by creating a dictionary which maps from class to color and then scattering each
point on its own using a for-loop and passing the respective color.

1 # create color dictionary


2 colors = {'Iris-setosa':'r', 'Iris-versicolor':'g', 'Iris-virginica':'b'}
3 # create a figure and axis
4 fig, ax = plt.subplots()
5 # plot each data-point
6 for i in range(len(iris['sepal_length'])):
7 ax.scatter(iris['sepal_length'][i], iris['sepal_width'][i],color=colors[iris['class'][i]])
8 # set a title and labels
9 ax.set_title('Iris Dataset')
10 ax.set_xlabel('sepal_length')
11 ax.set_ylabel('sepal_width')

matplotlib_simple_scatterplot_with_colors.py hosted with ❤ by GitHub view raw

Figure 5: Scatter Plot colored by class

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 5/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Line Chart
Get started Open in app
In Matplotlib we can create a line chart by calling the plot method. We can also plot
multiple columns in one graph, by looping through the columns we want and plotting
each column on the same axis.

1 # get columns to plot


2 columns = iris.columns.drop(['class'])
3 # create x data
4 x_data = range(0, iris.shape[0])
5 # create figure and axis
6 fig, ax = plt.subplots()
7 # plot each column
8 for column in columns:
9 ax.plot(x_data, iris[column], label=column)
10 # set title and legend
11 ax.set_title('Iris Dataset')
12 ax.legend()

matplotlib_simple_linechart.py hosted with ❤ by GitHub view raw

Figure 6: Line Chart

Histogram
In Matplotlib we can create a Histogram using the hist method. If we pass it categorical
data like the points column from the wine-review dataset it will automatically calculate
how often each class occurs.

1 # create figure and axis

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 6/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science
2 fig, ax = plt.subplots()
Get
3 started
# plot histogram
Open in app
4 ax.hist(wine_reviews['points'])
5 # set title and labels
6 ax.set_title('Wine Review Scores')
7 ax.set_xlabel('Points')
8 ax.set_ylabel('Frequency')

matplotlib_simple_histogram.py hosted with ❤ by GitHub view raw

Figure 7: Histogram

Bar Chart
A bar chart can be created using the bar method. The bar-chart isn’t automatically
calculating the frequency of a category so we are going to use pandas value_counts

function to do this. The bar-chart is useful for categorical data that doesn’t have a lot of
different categories (less than 30) because else it can get quite messy.

1 # create a figure and axis


2 fig, ax = plt.subplots()
3 # count the occurrence of each class
4 data = wine_reviews['points'].value_counts()
5 # get x and y data
6 points = data.index
7 frequency = data.values
8 # create bar chart
9 ax.bar(points, frequency)
10 # set title and labels
11 ax.set_title('Wine Review Scores')
https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 7/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

12 ax.set_xlabel('Points')
Get
13 started Open in app
ax.set_ylabel('Frequency')

matplotlib_simple_barchart.py hosted with ❤ by GitHub view raw

Figure 8: Bar-Chart

Pandas Visualization
Pandas is an open source high-performance, easy-to-use library providing data
structures, such as dataframes, and data analysis tools like the visualization tools we will
use in this article.

Pandas Visualization makes it really easy to create plots out of a pandas dataframe and
series. It also has a higher level API than Matplotlib and therefore we need less code for
the same results.

Pandas can be installed using either pip or conda.

pip install pandas


or
conda install pandas

Scatter Plot
https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 8/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

To create a scatter plot in Pandas we can call <dataset>.plot.scatter() and pass it two
Get started Open in app
arguments, the name of the x-column as well as the name of the y-column. Optionally
we can also pass it a title.

1 iris.plot.scatter(x='sepal_length', y='sepal_width', title='Iris Dataset')

pandas_simple_scatterplot.py hosted with ❤ by GitHub view raw

Figure 9: Scatter Plot

As you can see in the image it is automatically setting the x and y label to the column
names.

Line Chart
To create a line-chart in Pandas we can call <dataframe>.plot.line() . Whilst in

Matplotlib we needed to loop-through each column we wanted to plot, in Pandas we


don’t need to do this because it automatically plots all available numeric columns (at
least if we don’t specify a specific column/s).

1 iris.drop(['class'], axis=1).plot.line(title='Iris Dataset')

pandas_simple_linechart.py hosted with ❤ by GitHub view raw

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 9/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Get started Open in app

Figure 10: Line Chart

If we have more than one feature Pandas automatically creates a legend for us, as can be
seen in the image above.

Histogram
In Pandas, we can create a Histogram with the plot.hist method. There aren’t any
required arguments but we can optionally pass some like the bin size.

1 wine_reviews['points'].plot.hist()

pandas_simple_histogram.py hosted with ❤ by GitHub view raw

Figure 11: Histogram

It’s also really easy to create multiple histograms.

1 iris.plot.hist(subplots=True, layout=(2,2), figsize=(10, 10), bins=20)

pandas_multiple_histograms.py hosted with ❤ by GitHub view raw


https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 10/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Get started Open in app

Figure 12: Multiple Histograms

The subplots argument specifies that we want a separate plot for each feature and the
layout specifies the number of plots per row and column.

Bar Chart
To plot a bar-chart we can use the plot.bar() method, but before we can call this we
need to get our data. For this we will first count the occurrences using the value_count()

method and then sort the occurrences from smallest to largest using the sort_index()

method.

1 wine_reviews['points'].value_counts().sort_index().plot.bar()


https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 11/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science
pandas_simple_barchart_vertical.py hosted with ❤ by GitHub view raw

Get started Open in app

Figure 13: Vertical Bar-Chart

It’s also really simple to make a horizontal bar-chart using the plot.barh() method.

1 wine_reviews['points'].value_counts().sort_index().plot.barh()

pandas_simple_barchart_horizontal.py hosted with ❤ by GitHub view raw

Figure 14: Horizontal Bar-Chart

We can also plot other data then the number of occurrences.

1 wine_reviews.groupby("country").price.mean().sort_values(ascending=False)[:5].plot.bar()

pandas_simple_barchart_vertical_2.py hosted with ❤ by GitHub view raw

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 12/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Get started Open in app

Figure 15: Countries with the most expensive wine(on average)

In the example above we grouped the data by country and then took the mean of the
wine prices, ordered it, and plotted the 5 countries with the highest average wine price.

Seaborn
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-
level interface for creating attractive graphs.

Seaborn has a lot to offer. You can create graphs in one line that would take you multiple
tens of lines in Matplotlib. Its standard designs are awesome and it also has a nice
interface for working with pandas dataframes.

It can be imported by typing:

import seaborn as sns

Scatter plot
We can use the .scatterplot method for creating a scatterplot, and just as in Pandas we
need to pass it the column names of the x and y data, but now we also need to pass the

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 13/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

data as an additional argument because we aren’t calling the function on the data
Get started Open in app
directly as we did in Pandas.

1 sns.scatterplot(x='sepal_length', y='sepal_width', data=iris)

seaborn_simple_scatterplot.py hosted with ❤ by GitHub view raw

Figure 16: Scatterplot

We can also highlight the points by class using the hue argument, which is a lot easier
than in Matplotlib.

Figure 17: Scatterplot colored by class

Line chart

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 14/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

To create a line-chart the sns.lineplot method can be used. The only required
Get started Open in app
argument is the data, which in our case are the four numeric columns from the Iris
dataset. We could also use the sns.kdeplot method which rounds of the edges of the
curves and therefore is cleaner if you have a lot of outliers in your dataset.

Figure 18: Line Chart

Histogram
To create a histogram in Seaborn we use the sns.distplot method. We need to pass it
the column we want to plot and it will calculate the occurrences itself. We can also pass
it the number of bins, and if we want to plot a gaussian kernel density estimate inside
the graph.

Figure 19: Histogram

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 15/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Get started Open in app

Figure 20: Histogram with gaussian kernel density estimate

Bar chart
In Seaborn a bar-chart can be created using the sns.countplot method and passing it
the data.

Figure 21: Bar-Chart

Other graphs
Now that you have a basic understanding of the Matplotlib, Pandas Visualization and
Seaborn syntax I want to show you a few other graph types that are useful for extracting
insides.

For most of them, Seaborn is the go-to library because of its high-level interface that
allows for the creation of beautiful graphs in just a few lines of code.

Box plots
https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 16/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

A Box Plot is a graphical method of displaying the five-number summary. We can create
Get started Open in app
box plots using seaborns sns.boxplot method and passing it the data as well as the x
and y column name.

Figure 22: Boxplot

Box Plots, just like bar-charts are great for data with only a few categories but can get
messy really quickly.

Heatmap
A Heatmap is a graphical representation of data where the individual values contained
in a matrix are represented as colors. Heatmaps are perfect for exploring the correlation
of features in a dataset.

To get the correlation of the features inside a dataset we can call <dataset>.corr() ,

which is a Pandas dataframe method. This will give us the correlation matrix.

We can now use either Matplotlib or Seaborn to create the heatmap.

Matplotlib:

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 17/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Get started Open in app

Figure 23: Heatmap without annotations

To add annotations to the heatmap we need to add two for loops:

Figure 24: Heatmap with annotations

Seaborn makes it way easier to create a heatmap and add annotations:

Figure 25: Heatmap with annotations

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 18/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Faceting
Get started Open in app
Faceting is the act of breaking data variables up across multiple subplots and combining
those subplots into a single figure.

Faceting is really helpful if you want to quickly explore your dataset.

To use one kind of faceting in Seaborn we can use the FacetGrid . First of all, we need to

define the FacetGrid and pass it our data as well as a row or column, which will be used
to split the data. Then we need to call the map function on our FacetGrid object and
define the plot type we want to use, as well as the column we want to graph.

Figure 26: Facet-plot

You can make plots a lot bigger and more complicated than the example above. You can
find a few examples here.

Pairplot
Lastly, I will show you Seaborns pairplot and Pandas scatter_matrix , which enable
you to plot a grid of pairwise relationships in a dataset.

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 19/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Get started Open in app

Figure 27: Pairplot

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 20/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

Get started Open in app

Figure 28: Scatter matrix

As you can see in the images above these techniques are always plotting two features
with each other. The diagonal of the graph is filled with histograms and the other plots
are scatter plots.

Recommended Reading

Scraping Reddit data


How to scrape data from Reddit using the Python Reddit API
Wrapper(PRAW)
towardsdatascience.com

Conclusion
Data visualization is the discipline of trying to understand data by placing it in a visual
context so that patterns, trends and correlations that might not otherwise be detected
can be exposed.

Python offers multiple great graphing libraries that come packed with lots of different
features. In this article, we looked at Matplotlib, Pandas visualization and Seaborn.

If you liked this article consider subscribing on my Youtube Channel and following me
on social media.
https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 21/22
2/11/2021 Introduction to Data Visualization in Python | by Gilbert Tanner | Towards Data Science

The code covered in this article is available as a Github Repository.


Get started Open in app

If you have any questions, recommendations or critiques, I can be reached via Twitter or
the comment section.

Sign up for The Daily Pick


By Towards Data Science

Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to
Thursday. Make learning your daily ritual. Take a look

Your email

Get this newsletter

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information
about our privacy practices.

Thanks to TDS Editors and David Errington. 

Data Science Data Visualization Matplotlib Pandas Seaborn

About Help Legal

Get the Medium app

https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed 22/22

You might also like