Data Visualization Using Seaborn - Towards Data Science
Data Visualization Using Seaborn - Towards Data Science
I am back with the seaborn tutorial. Last time we learn about Data
Visualization using Matplotlib.
Keys Features
• Seaborn is a statistical plotting library
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 1/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Mandatory dependencies
numpy (>= 1.9.3)
scipy (>= 0.14.0)
matplotlib (>= 1.4.3)
pandas (>= 0.15.2)
Recommended dependencies
statsmodels (>= 0.5.0)
Optional Reading
Testing
To test seaborn, run make test in the root directory of the source
distribution. This runs the unit test suite (using pytest, but many older
tests use nose asserts). It also runs the example code in function docstrings
to smoke-test a broader and more realistic range of example usage.
The full set of tests requires an internet connection to download the
example datasets (if they haven’t been previously cached), but the unit
tests should be possible to run o ine.
Bugs
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 2/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Please report any bugs you encounter through the github issue tracker. It
will be most helpful to include a reproducible example on one of the
example datasets (accessed through load_dataset()). It is di cult debug
any issues without knowing the versions of seaborn and matplotlib you are
using, as well as what matplotlib backend you are using to draw the plots,
so please include those in your bug report.
Note: This article assumes you are familiar with python basic and
data visualization. Still, face any problem do comment or email me
your query.
In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 3/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
census_data = pd.read_csv('census_data.csv')
census_data.describe()
Out[2]:
Figure 1
In [3]:
census_data.head()
Out[3]:
Figure 2
In [4]:
census_data.info()
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 4/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Out[4]:
Figure 3
Scatter plot
The scatter plot is a mainstay of statistical visualization. It depicts the
joint distribution of two variables using a cloud of points, where each
point represents an observation in the dataset. This depiction allows
the eye to infer a substantial amount of information about whether
there is any meaningful relationship between them.
There are several ways to draw a scatter plot in seaborn. The most
basic, which should be used when both variables are numeric, is the
scatterplot() function.
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 5/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
In [5]:
sns.scatterplot(x='capital_loss',y='capital_gain',data=censu
s_data)
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0xaafb2b0>
Figure 4
In [6]:
sns.set(style="darkgrid")
tips = sns.load_dataset("tips") #tips is
inbuild dataset in seaborn
sns.relplot(x="total_bill", y="tip", data=tips);
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 6/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 5
Note: The scatterplot() is the default kind in relplot() (it can also be
forced by setting kind=”scatter”):
In [7]:
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0xac406a0>
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 7/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 6
In [8]:
sns.scatterplot(x='capital_loss',y='capital_gain',hue='marit
al_status',size='age',data=census_data)
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0xadc95c0>
Figure 7
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 8/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
In [9]:
Out[9]:
<seaborn.axisgrid.FacetGrid at 0xacdeb70>
Figure 8
Line plot
Scatter plots are highly e ective, but there is no universally optimal
type of visualization. Instead, the visual representation should be
adapted for the speci cs of the dataset and to the question you are
trying to answer with the plot.
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 9/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
In [10]:
df = pd.DataFrame(dict(time=np.arange(500),
value=np.random.randn(500).cumsum()))
g = sns.relplot(x="time", y="value", kind="line", data=df)
g.fig.autofmt_xdate()
"""
Figure-level interface for drawing relational plots onto a
FacetGrid.
Out[10]:
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 10/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 9
In [11]:
age_vs_hours_per_week = sns.relplot(x="age",
y="hours_per_week", kind="line", data=census_data
Out[11]:
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 11/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 10
In [12]:
age_vs_hours_per_week = sns.relplot(x="age",
y="hours_per_week", kind="line",sort=False,
data=census_data)
Out[12]:
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 12/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 11
The best approach may be to make more than one plot. Because
relplot() is based on the FacetGrid, this is easy to do. To show the
in uence of an additional variable, instead of assigning it to one of the
semantic roles in the plot, use it to “facet” the visualization. This means
that you make multiple axes and plot subsets of the data on each of
them:
In [13]:
sns.relplot(x='capital_loss',y='capital_gain',hue='marital_s
tatus',size='age',col='gender',data=census_data)
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 13/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Out[13]:
<seaborn.axisgrid.FacetGrid at 0xc8a1240>
Figure 12
In [14]:
sns.relplot(x='capital_loss',y='capital_gain',hue='marital_s
tatus',size='age',col='income_bracket',data=census_data)
Out[14]:
<seaborn.axisgrid.FacetGrid at 0xcdc25c0>
Figure 13
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 14/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
You can also show the in uence two variables this way: one by faceting
on the columns and one by faceting on the rows. As you start adding
more variables to the grid, you may want to decrease the gure size.
Remember that the size FacetGrid is parameterized by the height and
aspect ratio of each facet:
In [15]:
sns.relplot(x='capital_loss',y='capital_gain',hue='marital_s
tatus',size='age',col='income_bracket',row='race',height=5,d
ata=census_data)
Out[15]:
<seaborn.axisgrid.FacetGrid at 0xcdc2320>
Figure 14
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 15/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 15
Figure 16
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 16/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Categorical scatterplots:
In [16]:
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 17/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
sns.catplot(x="age",y="marital_status",data=census_data)
Out[16]:
<seaborn.axisgrid.FacetGrid at 0xdb18470>
Figure 17
The second approach adjusts the points along the categorical axis using
an algorithm that prevents them from overlapping. It can give a better
representation of the distribution of observations, although it only
works well for relatively small datasets. This kind of plot is sometimes
called a “beeswarm” and is drawn in seaborn by swarmplot(), which is
activated by setting kind=”swarm” in catplot():
In [27]:
#sns.catplot(x="age",y="relationship",kind='swarm',data=cens
us_data)
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 18/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
# or
#sns.swarmplot(x="relationship",y="age",data=census_data)
sns.catplot(x="day", y="total_bill", kind="swarm",
data=tips);
Out[27]:
Figure 18
In [29]:
Out[29]:
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 19/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 19
Box plot
The rst is the familiar boxplot(). This kind of plot shows the three
quartile values of the distribution along with extreme values. The
“whiskers” extend to points that lie within 1.5 IQRs of the lower and
upper quartile, and then observations that fall outside this range are
displayed independently. This means that each value in the boxplot
corresponds to an actual observation in the data.
In [32]:
sns.catplot(x="age",y="marital_status",kind='box',data=censu
s_data)
Out[32]:
<seaborn.axisgrid.FacetGrid at 0xd411860>
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 20/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 20
When adding a hue semantic, the box for each level of the semantic
variable is moved along the categorical axis so they don’t overlap:
In [37]:
sns.catplot(x="age",y="marital_status",kind='box',hue='gende
r',data=census_data)
Out[37]:
<seaborn.axisgrid.FacetGrid at 0xde8a8d0>
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 21/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 21
Violin plots
A di erent approach is a violinplot(), which combines a boxplot with
the kernel density estimation procedure described in the distributions
tutorial:
In [38]:
sns.catplot(x="age",y="marital_status",kind='violin',data=ce
nsus_data)
Out[38]:
<seaborn.axisgrid.FacetGrid at 0x184c4080>
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 22/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 22
In [41]:
sns.catplot(x="age",y="marital_status",kind='violin',bw=.15,
cut=0,data=census_data)
Out[41]:
<seaborn.axisgrid.FacetGrid at 0xfdea320>
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 23/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 23
Bar plots
A familiar style of plot that accomplishes this goal is a bar plot. In
seaborn, the barplot() function operates on a full dataset and applies a
function to obtain the estimate (taking the mean by default). When
there are multiple observations in each category, it also uses
bootstrapping to compute a con dence interval around the estimate
and plots that using error bars:
In [46]:
sns.catplot(x="income_bracket",y="age",kind='bar',data=censu
s_data)
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 24/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Out[46]:
<seaborn.axisgrid.FacetGrid at 0x160588d0>
Figure 24
In [47]:
sns.catplot(x="income_bracket",y="age",kind='bar',hue='gende
r',data=census_data)
Out[47]:
<seaborn.axisgrid.FacetGrid at 0xdf262e8>
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 25/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 25
A special case for the bar plot is when you want to show the number of
observations in each category rather than computing a statistic for a
second variable. This is similar to a histogram over a categorical, rather
than quantitative, variable. In seaborn, it’s easy to do so with the
countplot() function:
In [61]:
ax =
sns.catplot(x='marital_status',kind='count',data=census_data
,orient="h")
ax.fig.autofmt_xdate()
Out[61]:
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 26/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 26
Point plots
An alternative style for visualizing the same information is o ered by
the pointplot() function. This function also encodes the value of the
estimate with height on the other axis, but rather than showing a full
bar, it plots the point estimate and con dence interval. Additionally,
pointplot() connects points from the same hue category. This makes it
easy to see how the main relationship is changing as a function of the
hue semantic because your eyes are quite good at picking up on
di erences of slopes:
In [67]:
ax =
sns.catplot(x='marital_status',y='age',hue='relationship',ki
nd='point',data=census_data)
ax.fig.autofmt_xdate()
Out[67]:
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 27/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 27
In [78]:
sns.catplot(x="age", y="marital_status",
hue="income_bracket",
col="gender", aspect=.6,
kind="box", data=census_data);
out[78]:
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 28/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Figure 28
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 29/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
Thank you~
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 30/31
3/18/2019 Data Visualization using Seaborn – Towards Data Science
https://towardsdatascience.com/data-visualization-using-seaborn-fc24db95a850 31/31