Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
150 views

Data Visualization Using Seaborn - Towards Data Science

This document discusses data visualization using the Python library Seaborn. Seaborn is a data visualization library based on Matplotlib that provides a high-level interface for drawing attractive statistical graphics. It has beautiful default styles and works well with Pandas dataframes. The document demonstrates how to install Seaborn and import necessary libraries. It then shows examples of visualizing statistical relationships using scatter plots to depict the relationship between two variables using the tips dataset. Key plots include a scatter plot of capital loss vs capital gain from a census dataset and a relational plot of total bill vs tip from the tips dataset.

Uploaded by

uda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Data Visualization Using Seaborn - Towards Data Science

This document discusses data visualization using the Python library Seaborn. Seaborn is a data visualization library based on Matplotlib that provides a high-level interface for drawing attractive statistical graphics. It has beautiful default styles and works well with Pandas dataframes. The document demonstrates how to install Seaborn and import necessary libraries. It then shows examples of visualizing statistical relationships using scatter plots to depict the relationship between two variables using the tips dataset. Key plots include a scatter plot of capital loss vs capital gain from a census dataset and a relational plot of total bill vs tip from the tips dataset.

Uploaded by

uda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

3/18/2019 Data Visualization using Seaborn – Towards Data Science

Data Visualization using Seaborn


Mohit Sharma Follow
Nov 3, 2018 · 11 min read

“turned on at screen monitor” by Chris Liverani on Unsplash

I am back with the seaborn tutorial. Last time we learn about Data
Visualization using Matplotlib.

Seaborn is a Python data visualization library based on matplotlib. It


provides a high-level interface for drawing attractive and informative
statistical graphics.

Keys Features
• Seaborn is a statistical plotting library

• It has beautiful default styles

• It also is designed to work very well with Pandas dataframe


objects.

Installing and getting started

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 1/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

To install the latest release of seaborn, you can use pip:

pip install seaborn

It’s also possible to install the released version using conda:

conda install seaborn

Alternatively, you can use pip to install the development version


directly from github:

pip install git+https://github.com/mwaskom/seaborn.git

Another option would be to to clone the github repository and install


from your local copy:

pip install . Dependencies Python 2.7 or 3.5+

Mandatory dependencies
numpy (>= 1.9.3)
scipy (>= 0.14.0)
matplotlib (>= 1.4.3)
pandas (>= 0.15.2)

Recommended dependencies
statsmodels (>= 0.5.0)

Optional Reading
Testing
To test seaborn, run make test in the root directory of the source
distribution. This runs the unit test suite (using pytest, but many older
tests use nose asserts). It also runs the example code in function docstrings
to smoke-test a broader and more realistic range of example usage.
The full set of tests requires an internet connection to download the
example datasets (if they haven’t been previously cached), but the unit
tests should be possible to run o ine.

Bugs

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 2/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Please report any bugs you encounter through the github issue tracker. It
will be most helpful to include a reproducible example on one of the
example datasets (accessed through load_dataset()). It is di cult debug
any issues without knowing the versions of seaborn and matplotlib you are
using, as well as what matplotlib backend you are using to draw the plots,
so please include those in your bug report.

Note: This article assumes you are familiar with python basic and
data visualization. Still, face any problem do comment or email me
your query.

Refer Our — Data Visualization Using Matplotlib

Redmi 6 Pro (Gold, 3GB RAM, 32GB Storage)

Redmi 6 Pro (Gold, 3GB RAM, 32GB Storage):


Amazon.in: Electronics
www.amazon.in

Hands-On Machine Learning with Scikit-Learn


and Tensor Flow: Concepts, Tools, and…

Amazon.in - Buy Hands-On Machine Learning with


Scikit-Learn and Tensor Flow: Concepts, Tools, a…
www.amazon.in

In [1]:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 3/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

census_data = pd.read_csv('census_data.csv')
census_data.describe()

Out[2]:

Figure 1

In [3]:

census_data.head()

Out[3]:

Figure 2

In [4]:

census_data.info()

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 4/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Out[4]:

Figure 3

Visualizing Statistical Relationships


Statistical analysis is a process of understanding how variables in a
dataset relate to each other and how those relationships depend on
other variables. Visualization can be a core component of this process
because, when data are visualized properly, the human visual system
can see trends and patterns that indicate a relationship.

We will discuss most of the seaborn functions today-

Scatter plot
The scatter plot is a mainstay of statistical visualization. It depicts the
joint distribution of two variables using a cloud of points, where each
point represents an observation in the dataset. This depiction allows
the eye to infer a substantial amount of information about whether
there is any meaningful relationship between them.

There are several ways to draw a scatter plot in seaborn. The most
basic, which should be used when both variables are numeric, is the
scatterplot() function.

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 5/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

In [5]:

sns.scatterplot(x='capital_loss',y='capital_gain',data=censu
s_data)

Out[5]:

<matplotlib.axes._subplots.AxesSubplot at 0xaafb2b0>

Figure 4

In [6]:

sns.set(style="darkgrid")
tips = sns.load_dataset("tips") #tips is
inbuild dataset in seaborn
sns.relplot(x="total_bill", y="tip", data=tips);

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 6/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 5

Note: The scatterplot() is the default kind in relplot() (it can also be
forced by setting kind=”scatter”):

In [7]:

# adding some additional parameters


sns.scatterplot(x='capital_loss',y='capital_gain',hue='marit
al_status',data=census_data)

# hue: Can be either categorical or numeric, although color


mapping will
# behave differently in latter case.

Out[7]:

<matplotlib.axes._subplots.AxesSubplot at 0xac406a0>

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 7/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 6

In [8]:

sns.scatterplot(x='capital_loss',y='capital_gain',hue='marit
al_status',size='age',data=census_data)

Out[8]:

<matplotlib.axes._subplots.AxesSubplot at 0xadc95c0>

Figure 7

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 8/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

In [9]:

# As I said above the scatterplot() is the default kind in


relplot() (it can also be forced by setting kind="scatter"):
# see difference
sns.relplot(x='capital_loss',y='capital_gain',hue='marital_s
tatus',size='age',data=census_data)

Out[9]:

<seaborn.axisgrid.FacetGrid at 0xacdeb70>

Figure 8

Line plot
Scatter plots are highly e ective, but there is no universally optimal
type of visualization. Instead, the visual representation should be
adapted for the speci cs of the dataset and to the question you are
trying to answer with the plot.

With some datasets, you may want to understand changes in one


variable as a function of time, or a similarly continuous variable. In this

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 9/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

situation, a good choice is to draw a line plot. In Seaborn, this can be


accomplished by the lineplot() function, either directly or with relplot()
by setting kind=” line”:

In [10]:

df = pd.DataFrame(dict(time=np.arange(500),
value=np.random.randn(500).cumsum()))
g = sns.relplot(x="time", y="value", kind="line", data=df)
g.fig.autofmt_xdate()

"""
Figure-level interface for drawing relational plots onto a
FacetGrid.

This function provides access to several different axes-


level functions
that show the relationship between two variables with
semantic mappings
of subsets. The ``kind`` parameter selects the underlying
axes-level
function to use:

- :func:`scatterplot` (with ``kind="scatter"``; the default)


- :func:`lineplot` (with ``kind="line"``)
"""

Out[10]:

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 10/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 9

In [11]:

age_vs_hours_per_week = sns.relplot(x="age",
y="hours_per_week", kind="line", data=census_data

Out[11]:

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 11/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 10

In [12]:

age_vs_hours_per_week = sns.relplot(x="age",
y="hours_per_week", kind="line",sort=False,
data=census_data)

Out[12]:

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 12/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 11

Lineplot() assumes that you are most often trying to draw y as a


function of x, the default behavior is to sort the data by the x values
before plotting. However, this can be disabled:

Showing multiple relationships with facets


We’ve emphasized in this tutorial that, while these functions can show
several semantic variables at once, it’s not always e ective to do so. But
what about when you do want to understand how a relationship
between two variables depends on more than one other variable?

The best approach may be to make more than one plot. Because
relplot() is based on the FacetGrid, this is easy to do. To show the
in uence of an additional variable, instead of assigning it to one of the
semantic roles in the plot, use it to “facet” the visualization. This means
that you make multiple axes and plot subsets of the data on each of
them:

In [13]:

sns.relplot(x='capital_loss',y='capital_gain',hue='marital_s
tatus',size='age',col='gender',data=census_data)

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 13/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Out[13]:

<seaborn.axisgrid.FacetGrid at 0xc8a1240>

Figure 12

In [14]:

sns.relplot(x='capital_loss',y='capital_gain',hue='marital_s
tatus',size='age',col='income_bracket',data=census_data)

Out[14]:

<seaborn.axisgrid.FacetGrid at 0xcdc25c0>

Figure 13

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 14/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

You can also show the in uence two variables this way: one by faceting
on the columns and one by faceting on the rows. As you start adding
more variables to the grid, you may want to decrease the gure size.
Remember that the size FacetGrid is parameterized by the height and
aspect ratio of each facet:

In [15]:

sns.relplot(x='capital_loss',y='capital_gain',hue='marital_s
tatus',size='age',col='income_bracket',row='race',height=5,d
ata=census_data)

Out[15]:

<seaborn.axisgrid.FacetGrid at 0xcdc2320>

Figure 14

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 15/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 15

Figure 16

Plotting with Categorical data


In the relational plot tutorial we saw how to use di erent visual
representations to show the relationship between multiple variables in
a dataset. In the examples, we focused on cases where the main
relationship was between two numerical variables. If one of the main
variables is “categorical” (divided into discrete groups) it may be
helpful to use a more specialized approach to visualization.

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 16/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

In seaborn, there are several di erent ways to visualize a relationship


involving categorical data. Similar to the relationship between relplot()
and either scatterplot() or lineplot(), there are two ways to make these
plots. There are a number of axes-level functions for plotting
categorical data in di erent ways and a gure-level interface, catplot(),
that gives uni ed higher-level access to them.

It’s helpful to think of the di erent categorical plot kinds as belonging


to three di erent families, which we’ll discuss in detail below. They are:

Categorical scatterplots:

stripplot() (with kind=”strip”; the default)


swarmplot() (with kind=”swarm”)

Categorical distribution plots:

boxplot() (with kind=”box”)


violinplot() (with kind=”violin”)
boxenplot() (with kind=”boxen”)

Categorical estimate plots:

pointplot() (with kind=”point”)


barplot() (with kind=”bar”)
countplot() (with kind=”count”)

These families represent the data using di erent levels of granularity.

The default representation of the data in catplot() uses a scatterplot.


There are actually two di erent categorical scatter plots in seaborn.
They take di erent approaches to resolving the main challenge in
representing categorical data with a scatter plot, which is that all of the
points belonging to one category would fall on the same position along
the axis corresponding to the categorical variable. The approach used
by stripplot(), which is the default “kind” in catplot() is to adjust the
positions of points on the categorical axis with a small amount of
random “jitter”:

In [16]:

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 17/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

sns.catplot(x="age",y="marital_status",data=census_data)

Out[16]:

<seaborn.axisgrid.FacetGrid at 0xdb18470>

Figure 17

The second approach adjusts the points along the categorical axis using
an algorithm that prevents them from overlapping. It can give a better
representation of the distribution of observations, although it only
works well for relatively small datasets. This kind of plot is sometimes
called a “beeswarm” and is drawn in seaborn by swarmplot(), which is
activated by setting kind=”swarm” in catplot():

In [27]:

#sns.catplot(x="age",y="relationship",kind='swarm',data=cens
us_data)

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 18/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

# or
#sns.swarmplot(x="relationship",y="age",data=census_data)
sns.catplot(x="day", y="total_bill", kind="swarm",
data=tips);

Out[27]:

Figure 18

Similar to the relational plots, it’s possible to add another dimension to


a categorical plot by using a hue semantic. (The categorical plots do not
currently support size or style semantics). Each di erent categorical
plotting function handles the hue semantic di erently. For the scatter
plots, it is only necessary to change the color of the points:

In [29]:

sns.catplot(x="day", y="total_bill", hue="sex",


kind="swarm", data=tips);

Out[29]:

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 19/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 19

Box plot
The rst is the familiar boxplot(). This kind of plot shows the three
quartile values of the distribution along with extreme values. The
“whiskers” extend to points that lie within 1.5 IQRs of the lower and
upper quartile, and then observations that fall outside this range are
displayed independently. This means that each value in the boxplot
corresponds to an actual observation in the data.

In [32]:

sns.catplot(x="age",y="marital_status",kind='box',data=censu
s_data)

Out[32]:

<seaborn.axisgrid.FacetGrid at 0xd411860>

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 20/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 20

When adding a hue semantic, the box for each level of the semantic
variable is moved along the categorical axis so they don’t overlap:

In [37]:

sns.catplot(x="age",y="marital_status",kind='box',hue='gende
r',data=census_data)

Out[37]:

<seaborn.axisgrid.FacetGrid at 0xde8a8d0>

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 21/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 21

Violin plots
A di erent approach is a violinplot(), which combines a boxplot with
the kernel density estimation procedure described in the distributions
tutorial:

In [38]:

sns.catplot(x="age",y="marital_status",kind='violin',data=ce
nsus_data)

Out[38]:

<seaborn.axisgrid.FacetGrid at 0x184c4080>

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 22/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 22

This approach uses the kernel density estimate to provide a richer


description of the distribution of values. Additionally, the quartile and
whikser values from the boxplot are shown inside the violin. The
downside is that, because the violinplot uses a KDE, there are some
other parameters that may need tweaking, adding some complexity
relative to the straightforward boxplot:

In [41]:

sns.catplot(x="age",y="marital_status",kind='violin',bw=.15,
cut=0,data=census_data)

Out[41]:

<seaborn.axisgrid.FacetGrid at 0xfdea320>

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 23/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 23

Statistical estimation within categories


For other applications, rather than showing the distribution within
each category, you might want to show an estimate of the central
tendency of the values. Seaborn has two main ways to show this
information. Importantly, the basic API for these functions is identical
to that for the ones discussed above.

Bar plots
A familiar style of plot that accomplishes this goal is a bar plot. In
seaborn, the barplot() function operates on a full dataset and applies a
function to obtain the estimate (taking the mean by default). When
there are multiple observations in each category, it also uses
bootstrapping to compute a con dence interval around the estimate
and plots that using error bars:

In [46]:

sns.catplot(x="income_bracket",y="age",kind='bar',data=censu
s_data)

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 24/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Out[46]:

<seaborn.axisgrid.FacetGrid at 0x160588d0>

Figure 24

In [47]:

sns.catplot(x="income_bracket",y="age",kind='bar',hue='gende
r',data=census_data)

Out[47]:

<seaborn.axisgrid.FacetGrid at 0xdf262e8>

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 25/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 25

A special case for the bar plot is when you want to show the number of
observations in each category rather than computing a statistic for a
second variable. This is similar to a histogram over a categorical, rather
than quantitative, variable. In seaborn, it’s easy to do so with the
countplot() function:

In [61]:

ax =
sns.catplot(x='marital_status',kind='count',data=census_data
,orient="h")
ax.fig.autofmt_xdate()

Out[61]:

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 26/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 26

Point plots
An alternative style for visualizing the same information is o ered by
the pointplot() function. This function also encodes the value of the
estimate with height on the other axis, but rather than showing a full
bar, it plots the point estimate and con dence interval. Additionally,
pointplot() connects points from the same hue category. This makes it
easy to see how the main relationship is changing as a function of the
hue semantic because your eyes are quite good at picking up on
di erences of slopes:

In [67]:

ax =
sns.catplot(x='marital_status',y='age',hue='relationship',ki
nd='point',data=census_data)
ax.fig.autofmt_xdate()

Out[67]:

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 27/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 27

Showing multiple relationships with facets


Just like relplot(), the fact that catplot() is built on a FacetGrid means
that it is easy to add faceting variables to visualize higher-dimensional
relationships:

In [78]:

sns.catplot(x="age", y="marital_status",
hue="income_bracket",
col="gender", aspect=.6,
kind="box", data=census_data);

out[78]:

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 28/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Figure 28

Coding Interview Questions

Amazon.in - Buy Coding Interview Questions book


online at best prices in India on Amazon.in. Read…
www.amazon.in

Hands-On Programming with R: Write Your


Own Functions and Simulations

Amazon.in - Buy Hands-On Programming with R:


Write Your Own Functions and Simulations book…
www.amazon.in

E ective Python 1: 59 Speci c Ways to Write


Better Python

Amazon.in - Buy E ective Python 1: 59 Speci c


Ways to Write Better Python book online at best…
www.amazon.in

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 29/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

Full article on TheMenYouWantToBe …


To fork, this notebook goes to GitHub.

If you like this article, do share with others.

Thank you~

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 30/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science

https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 31/31

You might also like